- 06 9月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
If we can't grab the breadcrumb's spinlock, possibly due to a driver deadlock inside the waiters, ignore them. Like hangcheck, error capturing must work no matter how the driver/GPU dies. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20160906073844.22561-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 22 8月, 2016 1 次提交
-
-
由 David Weinehall 提交于
In an effort to simplify things for a future push of dev_priv instead of dev wherever possible, always take pdev via dev_priv where feasible, eliminating the direct access from dev. Right now this only eliminates a few cases of dev, but it also obviates that we pass dev into a lot of functions where dev_priv would be the more obvious choice. v2: Fixed one more place missing in the previous patch set Signed-off-by: NDavid Weinehall <david.weinehall@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-5-david.weinehall@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 20 8月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
As io_mapping.h now always allocates the struct, we can avoid that allocation and extra pointer dance by embedding the struct inside drm_i915_private Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
-
- 19 8月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
In order to handle tiled partial GTT mmappings, we need to associate the fence with an individual vma. v2: A couple of silly drops replaced spotted by Joonas Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-21-chris@chris-wilson.co.uk
-
- 15 8月, 2016 14 次提交
-
-
由 Chris Wilson 提交于
Just another useful register to inspect following a GPU hang. v2: Remove partial decoding of RING_MODE to userspace, be consistent and use GEN > 2 guards around RING_MODE everywhere. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-32-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
There is no other state pertaining to the completed requests in the hang, other than gleamed through the ringbuffer, so including the expired requests in the list of outstanding requests simply adds noise. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-31-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
It is useful when looking at captured error states to check the recorded BBADDR register (the address of the last batchbuffer instruction loaded) against the expected offset of the batch buffer, and so do a quick check that (a) the capture is true or (b) HEAD hasn't wandered off into the badlands. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-30-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Since contexts are not currently shared between userspace processes, we have an exact correspondence between context creator and guilty batch submitter. Therefore we can save some per-batch work by inspecting the context->pid upon error instead. Note that we take the context's creator's pid rather than the file's pid in order to better track fd passed over sockets. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-29-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Treat the VMA as the primary struct responsible for tracking bindings into the GPU's VM. That is we want to treat the VMA returned after we pin an object into the VM as the cookie we hold and eventually release when unpinning. Doing so eliminates the ambiguity in pinning the object and then searching for the relevant pin later. v2: Joonas' stylistic nitpicks, a fun rebase. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-27-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Use the GGTT VMA as the primary cookie for handing ring objects as the most common action upon the ring is mapping and unmapping which act upon the VMA itself. By restructuring the code to work with the ring VMA, we can shrink the code and remove a few cycles from context pinning. v2: Move the flush of the object back to before the first pin. We use the am-I-bound? query to only have to check the flush on the first bind and so avoid stalling on active rings. Lots of little renames and small hoops. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-18-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When working with contexts, we most frequently want the GGTT VMA for the context state, first and foremost. Since the object is available via the VMA, we need only then store the VMA. v2: Formatting tweaks to debugfs output, restored some comments removed in the next patch Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-15-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
No longer is knowing how much of the GTT (both mappable aperture and beyond) relevant, and the output clutters the real information - that is how many objects are allocated and bound (and by who) so that we can quickly grasp if there is a leak. v2: Relent, and rename pinned to indicate display only. Since the display objects are semi-static and are of variable size, they are the interesting objects to watch over time for aperture leaking. The other pins are either static (such as the scratch page) or very short lived (such as execbuf) and not part of the precious GGTT. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-6-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
With execlists, we have context objects everywhere, not just RCS. So store them for post-mortem debugging. This also has a secondary effect of removing one more unsafe list iteration with using preserved state from the hanging request. And now we can cross-reference the request's context state with that loaded by the GPU. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-3-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When capturing the error state, we do not need to know about every address space - just those that are related to the error. We know which context is active at the time, therefore we know which VM are implicated in the error. We can then restrict the VM which we report to the relevant subset. v2: s/i/count_active/ (and similar) Rewrite label generation for "Buffers" Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-2-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Not only does it make for good documentation and debugging aide, but it is also vital for when we want to unwind requests - such as when throwing away an incomplete request. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1470414607-32453-2-git-send-email-arun.siluvery@linux.intel.comReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-1-git-send-email-chris@chris-wilson.co.uk
-
- 05 8月, 2016 3 次提交
-
-
由 Chris Wilson 提交于
In the previous commit, we moved the obj->tiling_mode out of a bitfield and into its own integer so that we could safely use READ_ONCE(). Let us now repair some of that damage by sharing the tiling_mode with its companion, the fence stride. v2: New magic Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In preparation to perform some magic to speed up i915_vma_pin(), which is among the hottest of hot paths in execbuf, refactor all the bitfields accessed by i915_vma_pin() into a single unified set of flags. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In the next few patches, the VMA pinning API is overhauled and to reduce the churn we pull out the update to the accessors into a prep patch. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk
-
- 04 8月, 2016 4 次提交
-
-
由 Chris Wilson 提交于
We use "list" to denote the list and "link" to denote an element on that list. Rename request->list to match this idiom. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The future annotations will track the locking used for access to ensure that it is always sufficient. We make the preparations now to present the API ahead and to make sure that GCC can eliminate the unused parameter. Before: 6298417 3619610 696320 10614347 a1f64b vmlinux After: 6298417 3619610 696320 10614347 a1f64b vmlinux (with i915 builtin) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In the future, we will want to add annotations to the i915_gem_active struct. The API is thus expanded to hide direct access to the contents of i915_gem_active and mediated instead through a number of helpers. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In the next patch, request tracking is made more generic and for that we need a new expanded struct and to separate out the logic changes from the mechanical churn, we split out the structure renaming into this patch. v2: Writer's block. Add some spiel about why we track requests. v3: Now i915_gem_active. v4: Now with i915_gem_active_set() for attaching to the active request. v5: Use i915_gem_active_set() from inside the retirement handlers Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk
-
- 03 8月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
The state stored in this struct is not only the information about the buffer object, but the ring used to communicate with the hardware. Using buffer here is overly specific and, for me at least, conflates with the notion of buffer objects themselves. s/struct intel_ringbuffer/struct intel_ring/ s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/ s/describe_ctx_ringbuf()/describe_ctx_ring()/ s/intel_ring_get_active_head()/intel_engine_get_active_head()/ s/intel_ring_sync_index()/intel_engine_sync_index()/ s/intel_ring_init_seqno()/intel_engine_init_seqno()/ s/ring_stuck()/engine_stuck()/ s/intel_cleanup_engine()/intel_engine_cleanup()/ s/intel_stop_engine()/intel_engine_stop()/ s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/ s/intel_unpin_ringbuffer()/intel_unpin_ring()/ s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/ s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/ s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/ s/intel_ringbuffer_free()/intel_ring_free()/ Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Now that we have disambuigated ring and engine, we can use the clearer and more consistent name for the intel_ringbuffer pointer in the request. @@ struct drm_i915_gem_request *r; @@ - r->ringbuf + r->ring Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-12-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-2-git-send-email-chris@chris-wilson.co.uk
-
- 27 7月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Inside the error capture itself, we refer to not only the hardware engine, its ringbuffer but also the capture state. Finding clear names for each whilst avoiding mixing ring/intel_engine_cs is tricky. As a compromise we keep using ering for the error capture. v2: Use 'ee' locals for struct drm_i915_error_engine Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-8-git-send-email-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469606850-28659-3-git-send-email-chris@chris-wilson.co.uk
-
- 20 7月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
Rather than recomputing whether semaphores are enabled, we can do that computation once during early initialisation as the i915.semaphores module parameter is now read-only. s/i915_semaphores_is_enabled/i915.semaphores/ v2: Add the state to the debug dmesg as well Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-10-git-send-email-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-9-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
dma-buf provides a generic fence class for interoperation between drivers. Internally we use the request structure as a fence, and so with only a little bit of interfacing we can rebase those requests on top of dma-buf fences. This will allow us, in the future, to pass those fences back to userspace or between drivers. v2: The fence_context needs to be globally unique, not just unique to this device. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-4-git-send-email-chris@chris-wilson.co.uk
-
- 05 7月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
Since drm_i915_private is now a subclass of drm_device we do not need to chase the drm_i915_private->dev backpointer and can instead simply access drm_i915_private->drm directly. text data bss dec hex filename 1068757 4565 416 1073738 10624a drivers/gpu/drm/i915/i915.ko 1066949 4565 416 1071930 105b3a drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ struct drm_i915_private *d; identifier i; @@ ( - d->dev->i + d->drm.i | - d->dev + &d->drm ) and for good measure the dev_priv->dev backpointer was removed entirely. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMatthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
As we only ever keep the first error state around, we can avoid some work that can be quite intrusive if we don't record the error the second time around. This does move the race whereby the user could discard one error state as the second is being captured, but that race exists in the current code and we hope that recapturing error state is only done for debugging. Note that as we discard the error state for simulated errors, igt that exercise error capture continue to function. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467618513-4966-3-git-send-email-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
-
- 04 7月, 2016 3 次提交
-
-
由 Chris Wilson 提交于
Since we now subclass struct drm_device, we can save pointer dances by noting the equivalence of struct drm_device and struct drm_i915_private, i.e. by using to_i915(). text data bss dec hex filename 1073824 4562 416 1078802 107612 drivers/gpu/drm/i915/i915.ko 1068976 4562 416 1073954 106322 drivers/gpu/drm/i915/i915.ko Created by the coccinelle script: @@ expression E; identifier p; @@ - struct drm_i915_private *p = E->dev_private; + struct drm_i915_private *p = to_i915(E); Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
igt likes to inject GPU hangs into its command streams. However, as we expect these hangs, we don't actually want them recorded in the dmesg output or stored in the i915_error_state (usually). To accommodate this allow userspace to set a flag on the context that any hang emanating from that context will not be recorded. We still do the error capture (otherwise how do we find the guilty context and know its intent?) as part of the reason for random GPU hang injection is to exercise the race conditions between the error capture and normal execution. v2: Split out the request->ringbuf error capture changes. v3: Move the flag defines next to the intel_context->flags definition Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NDave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-9-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The request tells us where to read the ringbuf from, so use that information to simplify the error capture. If no request was active at the time of the hang, the ring is idle and there is no information inside the ring pertaining to the hang. Note carefully that this will reduce the amount of information stored in the error state - any ring without an active request will not be recorded. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDave Gordon <david.s.gordon@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-8-git-send-email-chris@chris-wilson.co.uk
-
- 03 7月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
smatch complains: drivers/gpu/drm/i915/i915_gpu_error.c:503 i915_error_state_to_str() warn: inconsistent indenting Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-4-git-send-email-chris@chris-wilson.co.ukReviewed-by: NMatthew Auld <matthew.auld@intel.com>
-
- 02 7月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
By using the same address for storing the HWS on every platform, we can remove the platform specific vfuncs and reduce the get-seqno routine to a single read of a cached memory location. v2: Fix semaphore_passed() to look at the signaling engine (not the waiter's) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
One particularly stressful scenario consists of many independent tasks all competing for GPU time and waiting upon the results (e.g. realtime transcoding of many, many streams). One bottleneck in particular is that each client waits on its own results, but every client is woken up after every batchbuffer - hence the thunder of hooves as then every client must do its heavyweight dance to read a coherent seqno to see if it is the lucky one. Ideally, we only want one client to wake up after the interrupt and check its request for completion. Since the requests must retire in order, we can select the first client on the oldest request to be woken. Once that client has completed his wait, we can then wake up the next client and so on. However, all clients then incur latency as every process in the chain may be delayed for scheduling - this may also then cause some priority inversion. To reduce the latency, when a client is added or removed from the list, we scan the tree for completed seqno and wake up all the completed waiters in parallel. Using igt/benchmarks/gem_latency, we can demonstrate this effect. The benchmark measures the number of GPU cycles between completion of a batch and the client waking up from a call to wait-ioctl. With many concurrent waiters, with each on a different request, we observe that the wakeup latency before the patch scales nearly linearly with the number of waiters (before external factors kick in making the scaling much worse). After applying the patch, we can see that only the single waiter for the request is being woken up, providing a constant wakeup latency for every operation. However, the situation is not quite as rosy for many waiters on the same request, though to the best of my knowledge this is much less likely in practice. Here, we can observe that the concurrent waiters incur extra latency from being woken up by the solitary bottom-half, rather than directly by the interrupt. This appears to be scheduler induced (having discounted adverse effects from having a rbtree walk/erase in the wakeup path), each additional wake_up_process() costs approximately 1us on big core. Another effect of performing the secondary wakeups from the first bottom-half is the incurred delay this imposes on high priority threads - rather than immediately returning to userspace and leaving the interrupt handler to wake the others. To offset the delay incurred with additional waiters on a request, we could use a hybrid scheme that did a quick read in the interrupt handler and dequeued all the completed waiters (incurring the overhead in the interrupt handler, not the best plan either as we then incur GPU submission latency) but we would still have to wake up the bottom-half every time to do the heavyweight slow read. Or we could only kick the waiters on the seqno with the same priority as the current task (i.e. in the realtime waiter scenario, only it is woken up immediately by the interrupt and simply queues the next waiter before returning to userspace, minimising its delay at the expense of the chain, and also reducing contention on its scheduler runqueue). This is effective at avoid long pauses in the interrupt handler and at avoiding the extra latency in realtime/high-priority waiters. v2: Convert from a kworker per engine into a dedicated kthread for the bottom-half. v3: Rename request members and tweak comments. v4: Use a per-engine spinlock in the breadcrumbs bottom-half. v5: Fix race in locklessly checking waiter status and kicking the task on adding a new waiter. v6: Fix deciding when to force the timer to hide missing interrupts. v7: Move the bottom-half from the kthread to the first client process. v8: Reword a few comments v9: Break the busy loop when the interrupt is unmasked or has fired. v10: Comments, unnecessary churn, better debugging from Tvrtko v11: Wake all completed waiters on removing the current bottom-half to reduce the latency of waking up a herd of clients all waiting on the same request. v12: Rearrange missed-interrupt fault injection so that it works with igt/drv_missed_irq_hang v13: Rename intel_breadcrumb and friends to intel_wait in preparation for signal handling. v14: RCU commentary, assert_spin_locked v15: Hide BUG_ON behind the compiler; report on gem_latency findings. v16: Sort seqno-groups by priority so that first-waiter has the highest task priority (and so avoid priority inversion). v17: Add waiters to post-mortem GPU hang state. v18: Return early for a completed wait after acquiring the spinlock. Avoids adding ourselves to the tree if the is already complete, and skips the awkward question of why we don't do completion wakeups for waits earlier than or equal to ourselves. v19: Prepare for init_breadcrumbs to fail. Later patches may want to allocate during init, so be prepared to propagate back the error code. Testcase: igt/gem_concurrent_blit Testcase: igt/benchmarks/gem_latency Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com> Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Dave Gordon <david.s.gordon@intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18 Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
-
- 11 5月, 2016 1 次提交
-
-
由 Tvrtko Ursulin 提交于
This way optimization from a previous patch works even better. v2: Rebase. Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJani Nikula <jani.nikula@intel.com>
-
- 09 5月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
text data bss dec hex filename 6309351 3578714 696320 10584385 a18141 vmlinux 6308391 3578714 696320 10583425 a17d81 vmlinux Almost 1KiB of code reduction. v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions text data bss dec hex filename 6304579 3578778 696320 10579677 a16edd vmlinux 6303427 3578778 696320 10578525 a16a5d vmlinux Now over 1KiB! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
-