- 06 3月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
The goal here is to try and reduce the latency of signaling additional requests following the wakeup from interrupt by reducing the list of to-be-signaled requests from an rbtree to a sorted linked list. The original choice of using an rbtree was to facilitate random insertions of request into the signaler while maintaining a sorted list. However, if we assume that most new requests are added when they are submitted, we see those new requests in execution order making a insertion sort fast, and the reduction in overhead of each signaler iteration significant. Since commit 56299fb7 ("drm/i915: Signal first fence from irq handler if complete"), we signal most fences directly from notify_ring() in the interrupt handler greatly reducing the amount of work that actually needs to be done by the signaler kthread. All the thread is then required to do is operate as the bottom-half, cleaning up after the interrupt handler and preparing the next waiter. This includes signaling all later completed fences in a saturated system, but on a mostly idle system we only have to rebuild the wait rbtree in time for the next interrupt. With this de-emphasis of the signaler's role, we want to rejig it's datastructures to reduce the amount of work we require to both setup the signal tree and maintain it on every interrupt. References: 56299fb7 ("drm/i915: Signal first fence from irq handler if complete") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180222092545.17216-1-chris@chris-wilson.co.uk
-
- 22 2月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 19 1月, 2018 1 次提交
-
-
由 Chris Wilson 提交于
Watching a light workload on Baytrail (running glxgears and a 1080p decode), instead of the system remaining at low frequency, the glxgears would regularly trigger waitboosting after which it would have to spend a few seconds throttling back down. In this case, the waitboosting is counter productive as the minimal wait for glxgears doesn't prevent it from functioning correctly and delivering frames on time. In this case, glxgears happens to almost always be waiting on the current request, which we already expect to complete quickly (see i915_spin_request) and so avoiding the waitboost on the active request and spinning instead provides the best latency without overcommitting to upclocking. However, if the system falls behind we still force the waitboost. Similarly, we will also trigger upclocking if we detect the system is not delivering frames on time - again using a mechanism that tries to detect a miss and not preemptively upclock. v2: Also skip boosting for after missed vblank if the desired request is already active. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180118131609.16574-1-chris@chris-wilson.co.uk
-
- 03 1月, 2018 2 次提交
-
-
由 Chris Wilson 提交于
Back up our comment that all signalers should have been signaled before we ourselves were retired with an assert to that effect. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-9-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
i915_gem_request_assign() is not used since commit 77f0d0e9 ("drm/i915/execlists: Pack the count into the low bits of the port.request"), so remove the defunct code References: 77f0d0e9 ("drm/i915/execlists: Pack the count into the low bits of the port.request") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180102151235.3949-1-chris@chris-wilson.co.uk
-
- 13 12月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
If a fence allocation fails in a blocking context, we will sleep on the fence as a last resort. We can therefore allow ourselves to fail and sleep on the fence instead of triggering a system-wide oom. This allows us to throttle malicious clients that are consuming lots of system resources by capping the amount of memory used by fences. Testcase: igt/gem_shrink/execbufX Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171212180652.22061-2-chris@chris-wilson.co.uk
-
- 05 10月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Use a priority stored in the context as the initial value when submitting a request. This allows us to change the default priority on a per-context basis, allowing different contexts to be favoured with GPU time at the expense of lower importance work. The user can adjust the context's priority via I915_CONTEXT_PARAM_PRIORITY, with more positive values being higher priority (they will be serviced earlier, after their dependencies have been resolved). Any prerequisite work for an execbuf will have its priority raised to match the new request as required. Normal users can specify any value in the range of -1023 to 0 [default], i.e. they can reduce the priority of their workloads (and temporarily boost it back to normal if so desired). Privileged users can specify any value in the range of -1023 to 1023, [default is 0], i.e. they can raise their priority above all overs and so potentially starve the system. Note that the existing schedulers are not fair, nor load balancing, the execution is strictly by priority on a first-come, first-served basis, and the driver may choose to boost some requests above the range available to users. This priority was originally based around nice(2), but evolved to allow clients to adjust their priority within a small range, and allow for a privileged high priority range. For example, this can be used to implement EGL_IMG_context_priority https://www.khronos.org/registry/egl/extensions/IMG/EGL_IMG_context_priority.txt EGL_CONTEXT_PRIORITY_LEVEL_IMG determines the priority level of the context to be created. This attribute is a hint, as an implementation may not support multiple contexts at some priority levels and system policy may limit access to high priority contexts to appropriate system privilege level. The default value for EGL_CONTEXT_PRIORITY_LEVEL_IMG is EGL_CONTEXT_PRIORITY_MEDIUM_IMG." so we can map PRIORITY_HIGH -> 1023 [privileged, will failback to 0] PRIORITY_MED -> 0 [default] PRIORITY_LOW -> -1023 They also map onto the priorities used by VkQueue (and a VkQueue is essentially a timeline, our i915_gem_context under full-ppgtt). v2: s/CAP_SYS_ADMIN/CAP_SYS_NICE/ v3: Report min/max user priorities as defines in the uapi, and rebase internal priorities on the exposed values. Testcase: igt/gem_exec_schedule Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-9-chris@chris-wilson.co.uk
-
- 29 9月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
We use INT_MIN to denote the priority of a request that has not been submitted to the scheduler; we treat INT_MIN as an invalid priority and initialise the request to it. Give the value a name so it stands out. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-3-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com
-
- 23 9月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
No users now outside of i915_wait_request(), so we can make it private to i915_gem_request.c, and assume the caller knows the seqno. In the process, also remove i915_gem_request_started() as that was only ever used by i915_spin_request(). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michal Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170922120333.25535-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 28 6月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Once a client has requested a waitboost, we keep that waitboost active until all clients are no longer waiting. This is because we don't distinguish which waiter deserves the boost. However, with the advent of fence signaling, the signaler threads appear as waiters to the RPS interrupt handler. So instead of using a single boolean to track when to keep the waitboost active, use a counter of all outstanding waitboosted requests. At this point, I have removed all vestiges of the rate limiting on clients. Whilst this means that compositors should remain more fluid, it also means that boosts are more prevalent. See commit b29c19b6 ("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion on the pros and cons of both approaches. A drawback of this implementation is that it requires constant request submission to keep the waitboost trimmed (as it is now cancelled when the request is completed). This will be fine for a busy system, but near idle the boosts may be kept for longer than desired (effectively tens of vblanks worstcase) and there is a reliance on rc6 instead. v2: Remove defunct rps.client_lock Reported-by: NMichał Winiarski <michal.winiarski@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk
-
- 20 6月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 5月, 2017 2 次提交
-
-
由 Chris Wilson 提交于
All the requests at the same priority are executed in FIFO order. They do not need to be stored in the rbtree themselves, as they are a simple list within a level. If we move the requests at one priority into a list, we can then reduce the rbtree to the set of priorities. This should keep the height of the rbtree small, as the number of active priorities can not exceed the number of active requests and should be typically only a few. Currently, we have ~2k possible different priority levels, that may increase to allow even more fine grained selection. Allocating those in advance seems a waste (and may be impossible), so we opt for allocating upon first use, and freeing after its requests are depleted. To avoid the possibility of an allocation failure causing us to lose a request, we preallocate the default priority (0) and bump any request to that priority if we fail to allocate it the appropriate plist. Having a request (that is ready to run, so not leading to corruption) execute out-of-order is better than leaking the request (and its dependency tree) entirely. There should be a benefit to reducing execlists_dequeue() to principally using a simple list (and reducing the frequency of both rbtree iteration and balancing on erase) but for typical workloads, request coalescing should be small enough that we don't notice any change. The main gain is from improving PI calls to schedule, and the explicit list within a level should make request unwinding simpler (we just need to insert at the head of the list rather than the tail and not have to make the rbtree search more complicated). v2: Avoid use-after-free when deleting a depleted priolist v3: Michał found the solution to handling the allocation failure gracefully. If we disable all priority scheduling following the allocation failure, those requests will be executed in fifo and we will ensure that this request and its dependencies are in strict fifo (even when it doesn't realise it is only a single list). Normal scheduling is restored once we know the device is idle, until the next failure! Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Explicitly assign the default priority, and give it a name. After much discussion, we have chosen to call it I915_PRIORITY_NORMAL! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-7-chris@chris-wilson.co.uk
-
- 19 4月, 2017 1 次提交
-
-
由 Paul E. McKenney 提交于
A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: NDavid Rientjes <rientjes@google.com>
-
- 15 4月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Introduce a new execobject.flag (EXEC_OBJECT_CAPTURE) that userspace may use to indicate that it wants the contents of this buffer preserved in the error state (/sys/class/drm/cardN/error) following a GPU hang involving this batch. Use this at your discretion, the contents of the error state. although compressed, are allocated with GFP_ATOMIC (i.e. limited) and kept for all eternity (until the error state is destroyed). Based on an earlier patch by Ben Widawsky <ben@bwidawsk.net> Testcase: igt/gem_exec_capture Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Matt Turner <mattst88@gmail.com> Acked-by: NBen Widawsky <ben@bwidawsk.net> Acked-by: NMatt Turner <mattst88@gmail.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170415093902.22581-1-chris@chris-wilson.co.uk
-
- 17 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
The only time we need to emit a flush inside request emission is after an execbuffer, for which we can use the full __i915_add_request(). All other instances want the simpler i915_add_request() without flushing, so remove the useless helper. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170317114709.8388-1-chris@chris-wilson.co.uk
-
- 09 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
__i915_gem_request_started() asserts that the seqno is valid, but i915_spin_request() was not checking before querying whether the request had started. Reported-by: NMichał Winiarski <michal.winiarski@intel.com> Fixes: 754c9fd5 ("drm/i915: Protect the request->global_seqno with the engine->timeline lock") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170308142238.22994-1-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
- 03 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Adding to the tail of the client request list as the only other user is in the throttle ioctl that iterates forwards over the list. It only needs protection against deletion of a request as it reads it, it simply won't see a new request added to the end of the list, or it would be too early and rejected. We can further reduce the number of spinlocks required when throttling by removing stale requests from the client_list as we throttle. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302122525.19675-1-chris@chris-wilson.co.uk
-
- 28 2月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
As execlists and other non-semaphore multi-engine devices coordinate between engines using interrupts, we can shave off a few 10s of microsecond of scheduling latency by doing the fence signaling from the interrupt as opposed to a RT kthread. (Realistically the delay adds about 1% to an individual cross-engine workload.) We only signal the first fence in order to limit the amount of work we move into the interrupt handler. We also have to remember that our breadcrumbs may be unordered with respect to the interrupt and so we still require the waiter process to perform some heavyweight coherency fixups, as well as traversing the tree of waiters. v2: No need for early exit in irq handler - it breaks the flow between patches and prevents the tracepoint v3: Restore rcu hold across irq signaling of request Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-2-chris@chris-wilson.co.uk
-
- 23 2月, 2017 4 次提交
-
-
由 Chris Wilson 提交于
After the request is cancelled, we then need to remove it from the global execution timeline and return it to the context timeline, the inverse of submit_request(). v2: Move manipulation of struct intel_wait to helpers Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-12-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
由 Chris Wilson 提交于
A request is assigned a global seqno only when it is on the hardware execution queue. The global seqno can be used to maintain a list of requests on the same engine in retirement order, for example for constructing a priority queue for waiting. Prior to its execution, or if it is subsequently removed in the event of preemption, its global seqno is zero. As both insertion and removal from the execution queue may operate in IRQ context, it is not guarded by the usual struct_mutex BKL. Instead those relying on the global seqno must be prepared for its value to change between reads. Only when the request is complete can the global seqno be stable (due to the memory barriers on submitting the commands to the hardware to write the breadcrumb, if the HWS shows that it has passed the global seqno and the global seqno is unchanged after the read, it is indeed complete). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
On reflection, we are only using the execute fence as a waitqueue on the global_seqno and not using it for dependency tracking between fences (unlike the submit and dma fences). By only treating it as a waitqueue, we can then treat it similar to the other waitqueues during submit, making the code simpler. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-8-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Replace the global device seqno with one for each engine, and account for in-flight seqno on each separately. This is consistent with dma-fence as each timeline has separate fence-contexts for each engine and a seqno is only ordered within a fence-context (i.e. seqno do not need to be ordered wrt to other engines, just ordered within a single engine). This is required to enable request rewinding for preemption on individual engines (we have to rewind the global seqno to avoid overflow, and we do not have to rewind all engines just to preempt one.) v2: Rename active_seqno to inflight_seqnos to more clearly indicate that it is a counter and not equivalent to the existing seqno. Update functions that operated on active_seqno similarly. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-3-chris@chris-wilson.co.uk
-
- 03 1月, 2017 1 次提交
-
-
由 Ville Syrjälä 提交于
The i915_gem_active stuff doesn't like a NULL ->retire hook, but the overlay code can set it to NULL. That obviously ends up oopsing. Fix it by introducing a new helper to assign the retirement callback that will switch out the NULL function pointer with i915_gem_retire_noop. Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Fixes: 0d9bdd88 ("drm/i915: Convert intel_overlay to request tracking") Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161207175647.10018-1-chris@chris-wilson.co.uk (cherry picked from commit ecd9caa0) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 23 12月, 2016 1 次提交
-
-
由 Ville Syrjälä 提交于
The i915_gem_active stuff doesn't like a NULL ->retire hook, but the overlay code can set it to NULL. That obviously ends up oopsing. Fix it by introducing a new helper to assign the retirement callback that will switch out the NULL function pointer with i915_gem_retire_noop. Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Fixes: 0d9bdd88 ("drm/i915: Convert intel_overlay to request tracking") Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161207175647.10018-1-chris@chris-wilson.co.uk
-
- 19 12月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
The requests conversion introduced a nasty bug where we could generate a new request in the middle of constructing a request if we needed to idle the system in order to evict space for a context. The request to idle would be executed (and waited upon) before the current one, creating a minor havoc in the seqno accounting, as we will consider the current request to already be completed (prior to deferred seqno assignment) but ring->last_retired_head would have been updated and still could allow us to overwrite the current request before execution. We also employed two different mechanisms to track the active context until it was switched out. The legacy method allowed for waiting upon an active context (it could forcibly evict any vma, including context's), but the execlists method took a step backwards by pinning the vma for the entire active lifespan of the context (the only way to evict was to idle the entire GPU, not individual contexts). However, to circumvent the tricky issue of locking (i.e. we cannot take struct_mutex at the time of i915_gem_request_submit(), where we would want to move the previous context onto the active tracker and unpin it), we take the execlists approach and keep the contexts pinned until retirement. The benefit of the execlists approach, more important for execlists than legacy, was the reduction in work in pinning the context for each request - as the context was kept pinned until idle, it could short circuit the pinning for all active contexts. We introduce new engine vfuncs to pin and unpin the context respectively. The context is pinned at the start of the request, and only unpinned when the following request is retired (this ensures that the context is idle and coherent in main memory before we unpin it). We move the engine->last_context tracking into the retirement itself (rather than during request submission) in order to allow the submission to be reordered or unwound without undue difficultly. And finally an ulterior motive for unifying context handling was to prepare for mock requests. v2: Rename to last_retired_context, split out legacy_context tracking for MI_SET_CONTEXT. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
-
- 15 11月, 2016 4 次提交
-
-
由 Chris Wilson 提交于
Track the priority of each request and use it to determine the order in which we submit requests to the hardware via execlists. The priority of the request is determined by the user (eventually via the context) but may be overridden at any time by the driver. When we set the priority of the request, we bump the priority of all of its dependencies to match - so that a high priority drawing operation is not stuck behind a background task. When the request is ready to execute (i.e. we have signaled the submit fence following completion of all its dependencies, including third party fences), we put the request into a priority sorted rbtree to be submitted to the hardware. If the request is higher priority than all pending requests, it will be submitted on the next context-switch interrupt as soon as the hardware has completed the current request. We do not currently preempt any current execution to immediately run a very high priority request, at least not yet. One more limitation, is that this is first implementation is for execlists only so currently limited to gen8/gen9. v2: Replace recursive priority inheritance bumping with an iterative depth-first search list. v3: list_next_entry() for walking lists v4: Explain how the dfs solves the recursion problem with PI. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-8-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The scheduler needs to know the dependencies of each request for the lifetime of the request, as it may choose to reschedule the requests at any time and must ensure the dependency tree is not broken. This is in additional to using the fence to only allow execution after all dependencies have been completed. One option was to extend the fence to support the bidirectional dependency tracking required by the scheduler. However the mismatch in lifetimes between the submit fence and the request essentially meant that we had to build a completely separate struct (and we could not simply reuse the existing waitqueue in the fence for one half of the dependency tracking). The extra dependency tracking simply did not mesh well with the fence, and keeping it separate both keeps the fence implementation simpler and allows us to extend the dependency tracking into a priority tree (whilst maintaining support for reordering the tree). To avoid the additional allocations and list manipulations, the use of the priotree is disabled when there are no schedulers to use it. v2: Create a dedicated slab for i915_dependency. Rename the lists. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Defer the transfer from the client's timeline onto the execution timeline from the point of readiness to the point of actual submission. For example, in execlists, a request is finally submitted to hardware when the hardware is ready, and only put onto the hardware queue when the request is ready. By deferring the transfer, we ensure that the timeline is maintained in retirement order if we decide to queue the requests onto the hardware in a different order than fifo. v2: Rebased onto distinct global/user timeline lock classes. v3: Play with the position of the spin_lock(). v4: Nesting finally resolved with distinct sw_fence lock classes. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In order to support deferred scheduling, we need to differentiate between when the request is ready to run (i.e. the submit fence is signaled) and when the request is actually run (a new execute fence). This is typically split between the request itself wanting to wait upon others (for which we use the submit fence) and the CPU wanting to wait upon the request, for which we use the execute fence to be sure the hardware is ready to signal completion. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161114204105.29171-3-chris@chris-wilson.co.uk
-
- 11 11月, 2016 1 次提交
-
-
由 Joonas Lahtinen 提交于
As a side product, had to split two other files; - i915_gem_fence_reg.h - i915_gem_object.h (only parts that needed immediate untanglement) I tried to move code in as big chunks as possible, to make review easier. i915_vma_compare was moved to a header temporarily. v2: - Use i915_gem_fence_reg.{c,h} v3: - Rebased v4: - Fix building when DEBUG_GEM is enabled by reordering a bit. Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1478861034-30643-1-git-send-email-joonas.lahtinen@linux.intel.com
-
- 08 11月, 2016 1 次提交
-
-
由 Joonas Lahtinen 提交于
Get rid of sloppy inline functions now that we don't have more users: i915_gem_request_get_seqno i915_gem_request_get_engine v2: - request->engine is always non-NULL (Chris) Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1478589108-3702-1-git-send-email-joonas.lahtinen@linux.intel.com
-
- 29 10月, 2016 6 次提交
-
-
由 Chris Wilson 提交于
Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which *i915* engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Our low-level wait routine has evolved from our generic wait interface that handled unlocked, RPS boosting, waits with time tracking. If we push our GEM fence tracking to use reservation_objects (required for handling multiple timelines), we lose the ability to pass the required information down to i915_wait_request(). However, if we push the extra functionality from i915_wait_request() to the individual callsites (i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use of those extras, we can both simplify our low level wait and prepare for extending the GEM interface for use of reservation_objects. v2: Rewrite i915_wait_request() kerneldocs Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Since we only use the more generic unlocked variant, just rename it as the normal i915_gem_active_wait(). The temporary cost is that we need to always acquire the reference in a RCU safe manner, but the benefit is that we will combine the common paths. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
We will need to wait on DMA completion (as signaled via struct fence) before executing our i915_gem_request. Therefore we want to expose a method for adding the await on the fence itself to the request. v2: Add a comment detailing a failure to handle a signal-on-any fence-array. v3: Pretend that magic numbers don't exist. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
-
- 25 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: NSumit Semwal <sumit.semwal@linaro.org> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
-
- 09 9月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Now that we have fences in place to drive request submission, we can employ those to queue requests after their dependencies as opposed to stalling in the middle of an execbuf ioctl. (However, we still choose to spin before enabling the IRQ as that is faster - though contentious.) v2: Do the fence ordering first, where we can still fail. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-20-chris@chris-wilson.co.uk
-