- 27 4月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
In the next patch, we require the engine vfuncs setup prior to initialising the pinned kernel contexts, so split the vfunc setup from the engine initialisation and call it earlier. v2: s/setup_xcs/setup_common/ for intel_ring_submission_setup() Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-6-chris@chris-wilson.co.uk
-
- 25 4月, 2019 3 次提交
-
-
由 Chris Wilson 提交于
In the current scheme, on submitting a request we take a single global GEM wakeref, which trickles down to wake up all GT power domains. This is undesirable as we would like to be able to localise our power management to the available power domains and to remove the global GEM operations from the heart of the driver. (The intent there is to push global GEM decisions to the boundary as used by the GEM user interface.) Now during request construction, each request is responsible via its logical context to acquire a wakeref on each power domain it intends to utilize. Currently, each request takes a wakeref on the engine(s) and the engines themselves take a chipset wakeref. This gives us a transition on each engine which we can extend if we want to insert more powermangement control (such as soft rc6). The global GEM operations that currently require a struct_mutex are reduced to listening to pm events from the chipset GT wakeref. As we reduce the struct_mutex requirement, these listeners should evaporate. Perhaps the biggest immediate change is that this removes the struct_mutex requirement around GT power management, allowing us greater flexibility in request construction. Another important knock-on effect, is that by tracking engine usage, we can insert a switch back to the kernel context on that engine immediately, avoiding any extra delay or inserting global synchronisation barriers. This makes tracking when an engine and its associated contexts are idle much easier -- important for when we forgo our assumed execution ordering and need idle barriers to unpin used contexts. In the process, it means we remove a large chunk of code whose only purpose was to switch back to the kernel context. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We wish to start segregating the power management into different control domains, both with respect to the hardware and the user interface. The first step is that at the lowest level flow of requests, we want to process a context event (and not a global GEM operation). In this patch, we introduce the context callbacks that in future patches will be redirected to per-engine interfaces leading to global operations as required. The intent is that this will be guarded by the timeline->mutex, except that retiring has not quite finished transitioning over from being guarded by struct_mutex. So at the moment it is protected by struct_mutex with a reminded to switch. v2: Rename default handlers to intel_context_enter_engine. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: NJani Nikula <jani.nikula@intel.com> Acked-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk
-
- 24 4月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
As we push for better compartmentalisation, it is more convenient to copy the default sseu configuration from the engine into the derived logical context, than it is to dig it out from i915->runtime_info. v2: Use intel_sseu_from_device_info() to describe the converter Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424095134.30249-1-chris@chris-wilson.co.uk
-
- 12 4月, 2019 3 次提交
-
-
由 Chris Wilson 提交于
The HW resets it CSB tail pointer on resetting the engine. Most of the time. In case it doesn't (and for system resume) we write the expected value anyway. For extra paranoia, flush the write before we invalidate the cacheline. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190412110159.10495-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
During reset, we try and stop the active ring. This has the consequence that we often clobber the RING registers within the context image. When we find an active request, we update the context image to rerun that request (if it was guilty, we replace the hanging user payload with NOPs). However, we were ignoring an active context if the request had completed, with the consequence that the next submission on that request would start with RING_HEAD==0 and not the tail of the previous request, causing all requests still in the ring to be rerun. Rare, but occasionally seen within CI where we would spot that the context seqno would reverse and complain that we were retiring an incomplete request. <0> [412.390350] <idle>-0 3d.s2 408373352us : __i915_request_submit: rcs0 fence 1e95b:3640 -> current 3638 <0> [412.390350] <idle>-0 3d.s2 408373353us : __i915_request_submit: rcs0 fence 1e95b:3642 -> current 3638 <0> [412.390350] <idle>-0 3d.s2 408373354us : __i915_request_submit: rcs0 fence 1e95b:3644 -> current 3638 <0> [412.390350] <idle>-0 3d.s2 408373354us : __i915_request_submit: rcs0 fence 1e95b:3646 -> current 3638 <0> [412.390350] <idle>-0 3d.s2 408373356us : __execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, fence 1e95b:3646 (current 3638), prio=4 <0> [412.390350] i915_sel-4613 0.... 408373374us : __i915_request_commit: rcs0 fence 1e95b:3648 <0> [412.390350] i915_sel-4613 0d..1 408373377us : process_csb: rcs0 cs-irq head=2, tail=3 <0> [412.390350] i915_sel-4613 0d..1 408373377us : process_csb: rcs0 csb[3]: status=0x00000001:0x00000000, active=0x1 <0> [412.390350] i915_sel-4613 0d..1 408373378us : __i915_request_submit: rcs0 fence 1e95b:3648 -> current 3638 <0> [412.390350] <idle>-0 3..s1 408373378us : execlists_submission_tasklet: rcs0 awake?=1, active=5 <0> [412.390350] i915_sel-4613 0d..1 408373379us : __execlists_submission_tasklet: rcs0 in[0]: ctx=2.2, fence 1e95b:3648 (current 3638), prio=4 <0> [412.390350] i915_sel-4613 0.... 408373381us : i915_reset_engine: rcs0 flags=4 <0> [412.390350] i915_sel-4613 0.... 408373382us : execlists_reset_prepare: rcs0: depth<-0 <0> [412.390350] <idle>-0 3d.s2 408373390us : process_csb: rcs0 cs-irq head=3, tail=4 <0> [412.390350] <idle>-0 3d.s2 408373390us : process_csb: rcs0 csb[4]: status=0x00008002:0x00000002, active=0x1 <0> [412.390350] <idle>-0 3d.s2 408373390us : process_csb: rcs0 out[0]: ctx=2.2, fence 1e95b:3648 (current 3640), prio=4 <0> [412.390350] i915_sel-4613 0.... 408373401us : intel_engine_stop_cs: rcs0 <0> [412.390350] i915_sel-4613 0d..1 408373402us : process_csb: rcs0 cs-irq head=4, tail=4 <0> [412.390350] i915_sel-4613 0.... 408373403us : intel_gpu_reset: engine_mask=1 <0> [412.390350] i915_sel-4613 0d..1 408373408us : execlists_cancel_port_requests: rcs0:port0 fence 1e95b:3648, (current 3648) <0> [412.390350] i915_sel-4613 0.... 408373442us : intel_engine_cancel_stop_cs: rcs0 <0> [412.390350] i915_sel-4613 0.... 408373442us : execlists_reset_finish: rcs0: depth->0 <0> [412.390350] ksoftirq-26 3..s. 408373442us : execlists_submission_tasklet: rcs0 awake?=1, active=0 <0> [412.390350] ksoftirq-26 3d.s1 408373443us : process_csb: rcs0 cs-irq head=5, tail=5 <0> [412.390350] i915_sel-4613 0.... 408373475us : i915_request_retire: rcs0 fence 1e95b:3640, current 3648 <0> [412.390350] i915_sel-4613 0.... 408373476us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3640, current 3648 <0> [412.390350] i915_sel-4613 0.... 408373494us : __i915_request_commit: rcs0 fence 1e95b:3650 <0> [412.390350] i915_sel-4613 0d..1 408373496us : process_csb: rcs0 cs-irq head=5, tail=5 <0> [412.390350] i915_sel-4613 0d..1 408373496us : __i915_request_submit: rcs0 fence 1e95b:3650 -> current 3648 <0> [412.390350] i915_sel-4613 0d..1 408373498us : __execlists_submission_tasklet: rcs0 in[0]: ctx=2.1, fence 1e95b:3650 (current 3648), prio=6 <0> [412.390350] i915_sel-4613 0.... 408373500us : i915_request_retire_upto: rcs0 fence 1e95b:3648, current 3648 <0> [412.390350] i915_sel-4613 0.... 408373500us : i915_request_retire: rcs0 fence 1e95b:3642, current 3648 <0> [412.390350] i915_sel-4613 0.... 408373501us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3642, current 3648 <0> [412.390350] i915_sel-4613 0.... 408373514us : i915_request_retire: rcs0 fence 1e95b:3644, current 3648 <0> [412.390350] i915_sel-4613 0.... 408373515us : i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3644, current 3648 <0> [412.390350] i915_sel-4613 0.... 408373527us : i915_request_retire: rcs0 fence 1e95b:3646, current 3640 <0> [412.390350] <idle>-0 3..s1 408373569us : execlists_submission_tasklet: rcs0 awake?=1, active=1 <0> [412.390350] <idle>-0 3d.s2 408373569us : process_csb: rcs0 cs-irq head=5, tail=1 <0> [412.390350] <idle>-0 3d.s2 408373570us : process_csb: rcs0 csb[0]: status=0x00000001:0x00000000, active=0x1 <0> [412.390350] <idle>-0 3d.s2 408373570us : process_csb: rcs0 csb[1]: status=0x00000018:0x00000002, active=0x5 <0> [412.390350] <idle>-0 3d.s2 408373570us : process_csb: rcs0 out[0]: ctx=2.1, fence 1e95b:3650 (current 3650), prio=6 <0> [412.390350] <idle>-0 3d.s2 408373571us : process_csb: rcs0 completed ctx=2 <0> [412.390350] i915_sel-4613 0.... 408373621us : i915_request_retire: i915_request_retire:253 GEM_BUG_ON(!i915_request_completed(request)) v2: Fixup the cancellation path to drain the CSB and reset the pointers. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190411130515.20716-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Before causing guc and execlists to diverge further (breaking guc in the process), take a copy of the current reset procedure and make it local to the guc submission backend Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190411130515.20716-1-chris@chris-wilson.co.uk
-
- 11 4月, 2019 3 次提交
-
-
由 Mika Kuoppala 提交于
Now when we can support variable csb fifo sizes, disable legacy mode. By disabling legacy we hope to get better hw testing coverage by assuming everyone else have switched over. v2: rebase References: https://bugs.freedesktop.org/show_bug.cgi?id=110338 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Kelvin Gardiner <kelvin.gardiner@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190405204657.12887-2-chris@chris-wilson.co.uk
-
由 Mika Kuoppala 提交于
Make csb entry count variable in preparation for larger CSB status FIFO size found on gen11+ hardware. v2: adapt to hwsp access only (Chris) non continuous mmio (Daniele) v3: entries (Chris), fix macro for checkpatch v4: num_entries (Chris) v5: consistency on num_entries Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190405204657.12887-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
On resume, we know that the only pinned contexts in danger of seeing corruption are the kernel context, and so we do not need to walk the list of all GEM contexts as we tracked them on each engine. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190410190120.830-1-chris@chris-wilson.co.uk
-
- 05 4月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
The PDP registers are an oddity inside the set of context saved registers in that they take the engine as a parameter to the macro and not the mmio_base as the others do. Make it accept the engine->mmio_base for consistency in programming the context registers. add/remove: 0/0 grow/shrink: 2/1 up/down: 3/-32 (-29) Function old new delta emit_ppgtt_update 324 326 +2 capture 5102 5103 +1 execlists_init_reg_state.isra 1128 1096 -32 And similar savings later! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190405123831.9724-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When we introduced preemption, we chose to keep it disabled for gen8 as supporting preemption inside GPGPU user batches required various w/a in userspace. Since then, the desire to preempt long queues of requests between batches (e.g. within busywaiting semaphores) has grown. So allow arbitration within the busywaits and between requests, but disable arbitration within user batches so that we can preempt between requests and not risk breaking GPGPU. However, since this preemption is much coarser and doesn't interfere with userspace, we decline to include it amongst the scheduler capabilities. (This is also required for us to skip over the preemption selftests that expect to be able to preempt user batches.) Michal suggested that we could perhaps allow preemption inside gen8 userspace batches if we can satisfy ourselves that the default preemption settings are viable with existing userspace (principally OpenCL which already should carry any known workaround). We could then merge the two code paths back into one, even dropping the artifical has-preemption device feature flag. Testcase: igt/gem_exec_scheduler/semaphore-user References: beecec90 ("drm/i915/execlists: Preemption!") Fixes: e8861964 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Michal Winiarski <michal.winiarski@intel.com> #irc Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190329134024.5254-1-chris@chris-wilson.co.uk
-
- 27 3月, 2019 2 次提交
-
-
由 Zhenyu Wang 提交于
This is to disable semaphore usage when on vGPU for now. Unfortunately GVT-g hasn't fully enabled semaphore usage yet, so current guest with semaphore use would cause vGPU failure. Although current semaphore failure with vGPU can be simply resolved by allowing cmd parser to accept MI_SEMAPHORE_WAIT command with address audit, we're checking general usage of semaphore and how we should handle it properly for virtualization in consider of function and security concern. So we decide to request to disable it for now in guest driver. Once GVT could support it, we would add new compat bit to turn it on. Fixes: e8861964 ("drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+") #vgpu Cc: Kevin Tian <kevin.tian@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190327090636.3547-1-zhenyuw@linux.intel.com
-
由 Daniele Ceraolo Spurio 提交于
A few advantages: - Prepares us for the planned split of display uncore from GT uncore - Improves our engine-centric view of the world in the engine code and allows us to avoid jumping back to dev_priv. - Allows us to wrap accesses to engine register in nice macros that automatically pick the right mmio base. Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-10-daniele.ceraolospurio@intel.com
-
- 22 3月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
Previously, our view has been always to run the engines independently within a context. (Multiple engines happened before we had contexts and timelines, so they always operated independently and that behaviour persisted into contexts.) However, at the user level the context often represents a single timeline (e.g. GL contexts) and userspace must ensure that the individual engines are serialised to present that ordering to the client (or forgot about this detail entirely and hope no one notices - a fair ploy if the client can only directly control one engine themselves ;) In the next patch, we will want to construct a set of engines that operate as one, that have a single timeline interwoven between them, to present a single virtual engine to the user. (They submit to the virtual engine, then we decide which engine to execute on based.) To that end, we want to be able to create contexts which have a single timeline (fence context) shared between all engines, rather than multiple timelines. v2: Move the specialised timeline ordering to its own function. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When we return pages to the system, we ensure that they are marked as being in the CPU domain since any external access is uncontrolled and we must assume the worst. This means that we need to always flush the pages on acquisition if we need to use them on the GPU, and from the beginning have used set-domain. Set-domain is overkill for the purpose as it is a general synchronisation barrier, but our intent is to only flush the pages being swapped in. If we move that flush into the pages acquisition phase, we know then that when we have obj->mm.pages, they are coherent with the GPU and need only maintain that status without resorting to heavy handed use of set-domain. The principle knock-on effect for userspace is through mmap-gtt pagefaulting. Our uAPI has always implied that the GTT mmap was async (especially as when any pagefault occurs is unpredicatable to userspace) and so userspace had to apply explicit domain control itself (set-domain). However, swapping is transparent to the kernel, and so on first fault we need to acquire the pages and make them coherent for access through the GTT. Our use of set-domain here leaks into the uABI that the first pagefault was synchronous. This is unintentional and baring a few igt should be unoticed, nevertheless we bump the uABI version for mmap-gtt to reflect the change in behaviour. Another implication of the change is that gem_create() is presumed to create an object that is coherent with the CPU and is in the CPU write domain, so a set-domain(CPU) following a gem_create() would be a minor operation that merely checked whether we could allocate all pages for the object. On applying this change, a set-domain(CPU) causes a clflush as we acquire the pages. This will have a small impact on mesa as we move the clflush here on !llc from execbuf time to create, but that should have minimal performance impact as the same clflush exists but is now done early and because of the clflush issue, userspace recycles bo and so should resist allocating fresh objects. Internally, the presumption that objects are created in the CPU write-domain and remain so through writes to obj->mm.mapping is more prevalent than I expected; but easy enough to catch and apply a manual flush. For the future, we should push the page flush from the central set_pages() into the callers so that we can more finely control when it is applied, but for now doing it one location is easier to validate, at the cost of sometimes flushing when there is no need. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
-
- 21 3月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
The timeline->name is only used for convenience in pretty printing the i915_request.fence->ops->get_timeline_name() and it is just as convenient to pull it from the gem_context directly. The few instances of its use inside GEM_TRACE() has proven more of a nuisance than helpful, so not worth saving imo. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-4-chris@chris-wilson.co.uk
-
由 Daniele Ceraolo Spurio 提交于
This will allow futher simplifications in the uncore handling. v2: move register access setup under uncore (Chris) Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-8-daniele.ceraolospurio@intel.com
-
- 19 3月, 2019 4 次提交
-
-
由 Chris Wilson 提交于
For virtual engines, we need to keep the HW context alive while it remains in use. For regular HW contexts, they are created and kept alive until the end of the GEM context. For simplicity, generalise the requirements and keep an active reference to each HW context. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
On unpinning the intel_context, we remove it from the active list inside the GEM context. This list is supposed to be guarded by the GEM context mutex, so remember to take it! Fixes: 7e3d9a59 ("drm/i915: Track active engines within a context") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318212347.30146-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
As the final request on a ring may hold the reference to this ring (via retiring the last pinned context), we may find ourselves chasing a dangling pointer on completion of the list. A quick solution is to hold a reference to the ring itself as we retire along it so that we only free it after we stop dereferencing it. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If we use the STORE_DATA_INDEX function we can use a fixed offset and avoid having to lookup up the engine HWS address. A step closer to being able to emit the final breadcrumb during request_add rather than later in the submission interrupt handler. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-9-chris@chris-wilson.co.uk
-
- 15 3月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
With direct submission being disabled while the reset in progress, we have a small window where we may forgo the submission of a new request and not notice its addition during execlists_reset_finish. To close this window, always schedule the submission tasklet on coming out of reset to catch any residual work. <6> [333.144082] i915: Running intel_hangcheck_live_selftests/igt_reset_engines <3> [333.296927] i915_reset_engine(rcs0:idle): failed to idle after reset <6> [333.296932] i915 0000:00:02.0: [drm] rcs0 <6> [333.296934] i915 0000:00:02.0: [drm] Hangcheck 0:a9ddf7a5 [4157 ms] <6> [333.296936] i915 0000:00:02.0: [drm] Reset count: 36048 (global 754) <6> [333.296938] i915 0000:00:02.0: [drm] Requests: <6> [333.296997] i915 0000:00:02.0: [drm] RING_START: 0x00000000 <6> [333.296999] i915 0000:00:02.0: [drm] RING_HEAD: 0x00000000 <6> [333.297001] i915 0000:00:02.0: [drm] RING_TAIL: 0x00000000 <6> [333.297003] i915 0000:00:02.0: [drm] RING_CTL: 0x00000000 <6> [333.297005] i915 0000:00:02.0: [drm] RING_MODE: 0x00000200 [idle] <6> [333.297007] i915 0000:00:02.0: [drm] RING_IMR: fffffeff <6> [333.297010] i915 0000:00:02.0: [drm] ACTHD: 0x00000000_00000000 <6> [333.297012] i915 0000:00:02.0: [drm] BBADDR: 0x00000000_00000000 <6> [333.297015] i915 0000:00:02.0: [drm] DMA_FADDR: 0x00000000_00000000 <6> [333.297017] i915 0000:00:02.0: [drm] IPEIR: 0x00000000 <6> [333.297019] i915 0000:00:02.0: [drm] IPEHR: 0x00000000 <6> [333.297021] i915 0000:00:02.0: [drm] Execlist status: 0x00000001 00000000 <6> [333.297023] i915 0000:00:02.0: [drm] Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (enabled) <6> [333.297025] i915 0000:00:02.0: [drm] ELSP[0] idle <6> [333.297027] i915 0000:00:02.0: [drm] ELSP[1] idle <6> [333.297028] i915 0000:00:02.0: [drm] HW active? 0x0 <6> [333.297044] i915 0000:00:02.0: [drm] Queue priority hint: -8186 <6> [333.297067] i915 0000:00:02.0: [drm] Q 2afac:5f2+ prio=-8186 @ 50ms: (null) <6> [333.297068] i915 0000:00:02.0: [drm] HWSP: <6> [333.297071] i915 0000:00:02.0: [drm] [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <6> [333.297073] i915 0000:00:02.0: [drm] * <6> [333.297075] i915 0000:00:02.0: [drm] [0040] 00000001 00000000 00000018 00000002 00000001 00000000 00000018 00000000 <6> [333.297077] i915 0000:00:02.0: [drm] [0060] 00000001 00000000 00008002 00000002 00000000 00000000 00000000 00000005 <6> [333.297079] i915 0000:00:02.0: [drm] [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <6> [333.297081] i915 0000:00:02.0: [drm] * <6> [333.297083] i915 0000:00:02.0: [drm] [00c0] 00000000 00000000 00000000 00000000 a9ddf7a5 00000000 00000000 00000000 <6> [333.297085] i915 0000:00:02.0: [drm] [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <6> [333.297087] i915 0000:00:02.0: [drm] * <6> [333.297089] i915 0000:00:02.0: [drm] Idle? no <6> [333.297090] i915_reset_engine(rcs0:idle): 3000 resets <3> [333.297092] i915/intel_hangcheck_live_selftests: igt_reset_engines failed with error -5 <3> [333.455460] i915 0000:00:02.0: Failed to idle engines, declaring wedged! ... <0> [333.491294] i915_sel-4916 1.... 333262143us : i915_reset_engine: rcs0 flags=4 <0> [333.491328] i915_sel-4916 1.... 333262143us : execlists_reset_prepare: rcs0: depth<-0 <0> [333.491362] i915_sel-4916 1.... 333262143us : intel_engine_stop_cs: rcs0 <0> [333.491396] i915_sel-4916 1d..1 333262144us : process_csb: rcs0 cs-irq head=5, tail=5 <0> [333.491424] i915_sel-4916 1.... 333262145us : intel_gpu_reset: engine_mask=1 <0> [333.491454] kworker/-214 5.... 333262184us : i915_gem_switch_to_kernel_context: awake?=yes <0> [333.491487] kworker/-214 5.... 333262192us : i915_request_add: rcs0 fence 2afac:1522 <0> [333.491520] kworker/-214 5.... 333262193us : i915_request_add: marking (null) as active <0> [333.491553] i915_sel-4916 1.... 333262199us : intel_engine_cancel_stop_cs: rcs0 <0> [333.491587] i915_sel-4916 1.... 333262199us : execlists_reset_finish: rcs0: depth->0 Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190313162835.30228-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Large ppGTT are differentiated by the requirement to go to four levels to address more than 32b. Given the introduction of more 4 level ppGTT with different sizes of addressable bits, rename i915_vm_is_48b() to better reflect the commonality of using 4 levels. Based on a patch by Bob Paauwe. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Bob Paauwe <bob.j.paauwe@intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190314223839.28258-4-chris@chris-wilson.co.uk
-
- 12 3月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
Move the pair of messages to the common callsite where it makes sense to include a bit more information about which request is being reset. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190312111146.10662-1-chris@chris-wilson.co.uk
-
- 08 3月, 2019 6 次提交
-
-
由 Chris Wilson 提交于
Introduce a mutex to start locking the HW contexts independently of struct_mutex, with a view to reducing the coarse struct_mutex. The intel_context.pin_mutex is used to guard the transition to and from being pinned on the gpu, and so is required before starting to build any request. The intel_context will then remain pinned until the request completes, but the mutex can be released immediately unpin completion of pinning the context. A slight variant of the above is used by per-context sseu that wants to inspect the pinned status of the context, and requires that it remains stable (either !pinned or pinned) across its operation. By using the pin_mutex to serialise operations while pin_count==0, we can take that pin_mutex for stabilise the boolean pin status. v2: for Tvrtko! * Improved commit message. * Dropped _gpu suffix from gen8_modify_rpcs_gpu. v3: Repair the locking for sseu selftests Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Each engine acquires a pin on the kernel contexts (normal and preempt) so that the logical state is always available on demand. Keep track of each engines pin by storing the returned pointer on the engine for quick access. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-6-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Push the intel_context pin callback down from intel_engine_cs onto the context itself by virtue of having a central caller for intel_context_pin() being able to lookup the intel_context itself. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In preparation for an ever growing number of engines and so ever increasing static array of HW contexts within the GEM context, move the array over to an rbtree, allocated upon first use. Unfortunately, this imposes an rbtree lookup at a few frequent callsites, but we should be able to mitigate those by moving over to using the HW context as our primary type and so only incur the lookup on the boundary with the user GEM context and engines. v2: Check for no HW context in guc_stage_desc_init Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If we place a pointer to the engine specific intel_context_ops in the engine itself, we can assign the ops pointer on initialising the context, and then rely on it being set. This simplifies the code in later patches. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
For use in the next patch, if we track which engines have been used by the HW, we can reduce the work required to flush our state off the HW to those engines. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-1-chris@chris-wilson.co.uk
-
- 06 3月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
Instead of passing the gem_context and engine to find the instance of the intel_context to use, pass around the intel_context instead. This is useful for the next few patches, where the intel_context is no longer a direct lookup. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190306084704.15755-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In the next patch, we are introducing a broad virtual engine to encompass multiple physical engines, losing the 1:1 nature of BIT(engine->id). To reflect the broader set of engines implied by the virtual instance, lets store the full bitmask. v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/) v3: Tvrtko voted for moah churn so teach everyone to not mention ring and use $class$instance throughout. v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and VCS[0-4] in later gen. We opt to keep the code consistent and use 0-index naming throughout. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
-
- 02 3月, 2019 3 次提交
-
-
由 Chris Wilson 提交于
We don't want to busywait on the GPU if we have other work to do. If we give non-busywaiting workloads higher (initial) priority than workloads that require a busywait, we will prioritise work that is ready to run immediately. We then also have to be careful that we don't give earlier semaphores an accidental boost because later work doesn't wait on other rings, hence we keep a history of semaphore usage of the dependency chain. v2: Stop rolling the bits into a chain and just use a flag in case this request or any of our dependencies use a semaphore. The rolling around was contagious as Tvrtko was heard to fall off his chair. Testcase: igt/gem_exec_schedule/semaphore Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Having introduced per-context seqno, we now have a means to identity progress across the system without feel of rollback as befell the global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in advance of submission safe in the knowledge that our target seqno and address is stable. However, since we are telling the GPU to busy-spin on the target address until it matches the signaling seqno, we only want to do so when we are sure that busy-spin will be completed quickly. To achieve this we only submit the request to HW once the signaler is itself executing (modulo preemption causing us to wait longer), and we only do so for default and above priority requests (so that idle priority tasks never themselves hog the GPU waiting for others). As might be reasonably expected, HW semaphores excel in inter-engine synchronisation microbenchmarks (where the 3x reduced latency / increased throughput more than offset the power cost of spinning on a second ring) and have significant improvement (can be up to ~10%, most see no change) for single clients that utilize multiple engines (typically media players and transcoders), without regressing multiple clients that can saturate the system or changing the power envelope dramatically. v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway. v4: Tell the world and include it as part of scheduler caps. Testcase: igt/gem_exec_whisper Testcase: igt/benchmarks/gem_wsim Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
On unwinding the active request we give it a small (limited to internal priority levels) boost to prevent it from being gazumped a second time. However, this means that it can be promoted to above the request that triggered the preemption request, causing a preempt-to-idle cycle for no change. We can avoid this if we take the boost into account when checking if the preemption request is valid. v2: After preemption the active request will be after the preemptee if they end up with equal priority. v3: Tvrtko pointed out that this, the existing logic, makes I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk! v4: Prove Tvrtko was right about WAIT being non-preemptible and test it. v5: Except not all priorities were made equal, and the WAIT not preempting is only if we start off as !NEWCLIENT. v6: More commentary after coming to an understanding about what I had forgotten to say. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-1-chris@chris-wilson.co.uk
-
- 01 3月, 2019 1 次提交
-
-
由 Chris Wilson 提交于
WAIT is occasionally suppressed by virtue of preempted requests being promoted to NEWCLIENT if they have not all ready received that boost. Make this consistent for all WAIT boosts that they are not allowed to preempt executing contexts and are merely granted the right to be at the front of the queue for the next execution slot. This is in keeping with the desire that the WAIT boost be a minor tweak that does not give excessive promotion to its user and open ourselves to trivial abuse. The problem with the inconsistent WAIT preemption becomes more apparent as the preemption is propagated across the engines, where one engine may preempt and the other not, and we be relying on the exact execution order being consistent across engines (e.g. using HW semaphores to coordinate parallel execution). v2: Also protect GuC submission from false preemption loops. v3: Build bug safeguards and better debug messages for st. v4: Do the priority bumping in unsubmit (i.e. on preemption/reset unwind), applying it earlier during submit causes out-of-order execution combined with execute fences. v5: Call sw_fence_fini for our dummy request (Matthew) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
-
- 28 2月, 2019 2 次提交
-
-
由 Chris Wilson 提交于
As kmem_caches share the same properties (size, allocation/free behaviour) for all potential devices, we can use global caches. While this potential has worse fragmentation behaviour (one can argue that different devices would have different activity lifetimes, but you can also argue that activity is temporal across the system) it is the default behaviour of the system at large to amalgamate matching caches. The benefit for us is much reduced pointer dancing along the frequent allocation paths. v2: Defer shrinking until after a global grace period for futureproofing multiple consumers of the slab caches, similar to the current strategy for avoiding shrinking too early. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Do a pass over all the engines upon starting to determine the global scheduler capability flags (those that are agreed upon by all). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190226102404.29153-7-chris@chris-wilson.co.uk
-