- 03 5月, 2017 6 次提交
-
-
由 Chris Wilson 提交于
Rather than use a global modparam, we can just check to see if the engine has semaphores configured upon it. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
As we may unwind the requests, even though the request we are awaiting has a global_seqno that seqno may be revoked during the await and so we can not reliably use it as a barrier for all future awaits on the same timeline. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-6-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
With the addition of the inter-context intel_time.sync map, having a very similar sync_seqno[] is confusing. Aide the reader by denoting that this is a pre-allocated array for storing semaphore sync points wrt to the global seqno. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Track the latest fence waited upon on each context, and only add a new asynchronous wait if the new fence is more recent than the recorded fence for that context. This requires us to filter out unordered timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the absence of a universal identifier, we have to use our own i915->mm.unordered_timeline token. v2: Throw around the debug crutches v3: Inline the likely case of the pre-allocation cache being full. v4: Drop the pre-allocation support, we can lose the most recent fence in case of allocation failure -- it just means we may emit more awaits than strictly necessary but will not break. v5: Trim allocation size for leaf nodes, they only need an array of u32 not pointers. v6: Create mock_timeline to tidy selftest writing v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko) v8: Prune the stale sync points when we idle. v9: Include a small benchmark in the kselftests v10: Separate the idr implementation into its own compartment. (Tvrkto) v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto) v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves v13: kselftests to investigate struct i915_syncmap itself (Tvrtko) v14: Foray into ascii art graphs v15: Take into account that the random lookup/insert does 2 prng calls, not 1, when benchmarking, and use for_each_set_bit() (Tvrtko) v16: Improved ascii art Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Currently we filter out repeated use of the same timeline in the low level i915_gem_request_await_request(), after having added the dependency on the old request. However, we can lift this to i915_gem_request_await_dma_fence() (before the dependency is added) using the observation that requests along the same timeline are explicitly ordered via i915_add_request (along with the dependencies). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
By first unwrapping an incoming fence-array into its child fences, we can simplify the internal branching, and so avoid triggering a potential bug in the next patch when not squashing the child fences on the same timeline. It will also have the advantage of keeping the (top-level) fence arrays out of any fence/timeline caching since these are unordered timelines but with a random context id. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-2-chris@chris-wilson.co.uk
-
- 26 4月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
If we are enabling the breadcrumbs signaling prior to submitting the request, we know that we cannot have missed the interrupt and can therefore skip immediately waking the signaler to check. This reduces a significant chunk of the __i915_gem_request_submit() overhead for inter-engine synchronisation, for example in gem_exec_whisper. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170426080659.28771-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
- 25 4月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
We need to keep track of the last location we ask the hw to read up to (RING_TAIL) separately from our last write location into the ring, so that in the event of a GPU reset we do not tell the HW to proceed into a partially written request (which can happen if that request is waiting for an external signal before being executed). v2: Refactor intel_ring_reset() (Mika) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_fence/await-hang Fixes: 821ed7df ("drm/i915: Update reset path to fix incomplete requests") Fixes: d55ac5bf ("drm/i915: Defer transfer onto execution timeline to actual hw submission") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
-
- 22 4月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Although we do check the completion-status of the request before actually adding a wait on it (either to its submit fence or its completion dma-fence), we currently do not check before adding it to the dependency lists. In fact, without checking for a completed request we may try to use the signaler after it has been retired and its dependency tree freed: [ 60.044057] BUG: KASAN: use-after-free in __list_add_valid+0x1d/0xd0 at addr ffff880348c9e6a0 [ 60.044118] Read of size 8 by task gem_exec_fence/530 [ 60.044164] CPU: 1 PID: 530 Comm: gem_exec_fence Tainted: G E 4.11.0-rc7+ #46 [ 60.044226] Hardware name: ��������������������������������� ���������������������������������/���������������������������������, BIOS RYBDWi35.86A.0246.2 [ 60.044290] Call Trace: [ 60.044337] dump_stack+0x4d/0x6a [ 60.044383] kasan_object_err+0x21/0x70 [ 60.044435] kasan_report+0x225/0x4e0 [ 60.044488] ? __list_add_valid+0x1d/0xd0 [ 60.044534] ? kasan_kmalloc+0xad/0xe0 [ 60.044587] __asan_load8+0x5e/0x70 [ 60.044639] __list_add_valid+0x1d/0xd0 [ 60.044788] __i915_priotree_add_dependency+0x67/0x130 [i915] [ 60.044895] i915_gem_request_await_request+0xa8/0x370 [i915] [ 60.044974] i915_gem_request_await_dma_fence+0x129/0x140 [i915] [ 60.045049] i915_gem_do_execbuffer.isra.37+0xb0a/0x26b0 [i915] [ 60.045077] ? save_stack+0xb1/0xd0 [ 60.045105] ? save_stack_trace+0x1b/0x20 [ 60.045132] ? save_stack+0x46/0xd0 [ 60.045158] ? kasan_kmalloc+0xad/0xe0 [ 60.045184] ? __kmalloc+0xd8/0x670 [ 60.045229] ? drm_ioctl+0x359/0x640 [drm] [ 60.045256] ? SyS_ioctl+0x41/0x70 [ 60.045330] ? i915_vma_move_to_active+0x540/0x540 [i915] [ 60.045360] ? tty_insert_flip_string_flags+0xa1/0xf0 [ 60.045387] ? tty_flip_buffer_push+0x63/0x70 [ 60.045414] ? remove_wait_queue+0xa9/0xc0 [ 60.045441] ? kasan_unpoison_shadow+0x35/0x50 [ 60.045467] ? kasan_kmalloc+0xad/0xe0 [ 60.045494] ? kasan_check_write+0x14/0x20 [ 60.045568] i915_gem_execbuffer2+0xdb/0x2a0 [i915] [ 60.045616] drm_ioctl+0x359/0x640 [drm] [ 60.045705] ? i915_gem_execbuffer+0x5a0/0x5a0 [i915] [ 60.045751] ? drm_version+0x150/0x150 [drm] [ 60.045778] ? compat_start_thread+0x60/0x60 [ 60.045805] ? plist_del+0xda/0x1a0 [ 60.045833] do_vfs_ioctl+0x12e/0x910 [ 60.045860] ? ioctl_preallocate+0x130/0x130 [ 60.045886] ? pci_mmcfg_check_reserved+0xc0/0xc0 [ 60.045913] ? vfs_write+0x196/0x240 [ 60.045939] ? __fget_light+0xa7/0xc0 [ 60.045965] SyS_ioctl+0x41/0x70 [ 60.045991] entry_SYSCALL_64_fastpath+0x17/0x98 [ 60.046017] RIP: 0033:0x7feb2baefc47 [ 60.046042] RSP: 002b:00007fff56d28e58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 60.046075] RAX: ffffffffffffffda RBX: 00007fff56d290a8 RCX: 00007feb2baefc47 [ 60.046102] RDX: 00007fff56d29050 RSI: 00000000c0406469 RDI: 0000000000000003 [ 60.046129] RBP: 00007fff56d29050 R08: 000055ecc4cd27d0 R09: 00007feb2bda8600 [ 60.046154] R10: 0000000000000073 R11: 0000000000000246 R12: 00000000c0406469 [ 60.046177] R13: 0000000000000003 R14: 000000000000000f R15: 0000000000000099 [ 60.046203] Object at ffff880348c9e680, in cache i915_dependency size: 64 [ 60.046225] Allocated: [ 60.046246] PID = 530 [ 60.046269] save_stack_trace+0x1b/0x20 [ 60.046292] save_stack+0x46/0xd0 [ 60.046318] kasan_kmalloc+0xad/0xe0 [ 60.046343] kasan_slab_alloc+0x12/0x20 [ 60.046368] kmem_cache_alloc+0xab/0x650 [ 60.046445] i915_gem_request_await_request+0x88/0x370 [i915] [ 60.046559] i915_gem_request_await_dma_fence+0x129/0x140 [i915] [ 60.046705] i915_gem_do_execbuffer.isra.37+0xb0a/0x26b0 [i915] [ 60.046849] i915_gem_execbuffer2+0xdb/0x2a0 [i915] [ 60.046936] drm_ioctl+0x359/0x640 [drm] [ 60.046987] do_vfs_ioctl+0x12e/0x910 [ 60.047038] SyS_ioctl+0x41/0x70 [ 60.047090] entry_SYSCALL_64_fastpath+0x17/0x98 [ 60.047139] Freed: [ 60.047179] PID = 530 [ 60.047223] save_stack_trace+0x1b/0x20 [ 60.047269] save_stack+0x46/0xd0 [ 60.047317] kasan_slab_free+0x72/0xc0 [ 60.047366] kmem_cache_free+0x39/0x160 [ 60.047512] i915_gem_request_retire+0x83f/0x930 [i915] [ 60.047657] i915_gem_request_alloc+0x166/0x600 [i915] [ 60.047799] i915_gem_do_execbuffer.isra.37+0xad8/0x26b0 [i915] [ 60.047897] i915_gem_execbuffer2+0xdb/0x2a0 [i915] [ 60.047942] drm_ioctl+0x359/0x640 [drm] [ 60.047968] do_vfs_ioctl+0x12e/0x910 [ 60.047993] SyS_ioctl+0x41/0x70 [ 60.048019] entry_SYSCALL_64_fastpath+0x17/0x98 [ 60.048044] Memory state around the buggy address: [ 60.048066] ffff880348c9e580: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 60.048105] ffff880348c9e600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 60.048138] >ffff880348c9e680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 60.048170] ^ [ 60.048191] ffff880348c9e700: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 60.048225] ffff880348c9e780: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc Note to hit the use-after-free requires us to be passed back a request via a fence-array, that is from explicit fencing accumulated into a sync-file fence-array. Fixes: 52e54209 ("drm/i915/scheduler: Record all dependencies upon request construction") Testcase: igt/gem_exec_fence/expired-history Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170422081537.6468-1-chris@chris-wilson.co.uk
-
- 15 4月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Introduce a new execobject.flag (EXEC_OBJECT_CAPTURE) that userspace may use to indicate that it wants the contents of this buffer preserved in the error state (/sys/class/drm/cardN/error) following a GPU hang involving this batch. Use this at your discretion, the contents of the error state. although compressed, are allocated with GFP_ATOMIC (i.e. limited) and kept for all eternity (until the error state is destroyed). Based on an earlier patch by Ben Widawsky <ben@bwidawsk.net> Testcase: igt/gem_exec_capture Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Matt Turner <mattst88@gmail.com> Acked-by: NBen Widawsky <ben@bwidawsk.net> Acked-by: NMatt Turner <mattst88@gmail.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170415093902.22581-1-chris@chris-wilson.co.uk
-
- 07 4月, 2017 2 次提交
-
-
由 Chris Wilson 提交于
When we retire the last request on the ring, before we ever access that ring again we know it will be completely idle and so we can advance the ring->head fully to the end (i.e. ring->tail) and not just to the start of the breadcrumb. This allows us to skip re-emitting the breadcrumb after resetting the GPU if the ring was entirely idle. This prevents us from overwriting a seqno wraparound by re-executing a stale breadcrumb, i.e. submit_request(1) intel_engine_init_global_seqno(0) i915_reset() would then leave 1 in the HWS, but the next request to execute would also be with seqno 1. The sanity checks upon submission detect this as a timewarp and explode. By setting the ring as empty, upon reset the HWS is left as 0, leaving it consistent with the timeline. v2: Fix check for deleting last element of list. We know that this request is always the first element of the ring, so only if next points back to the start will this be the only request in flight. v3: Remove opencoding of list_is_last() v4: Move the block to its own function for some clarity. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144 Testcase: igt/gem_exec_whisper/hang-* Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170406170028.26871-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
由 Chris Wilson 提交于
When we update the global seqno (on the engine timeline), we modify HW state (both registers and mapped pages). As we do this, we should be sure that the HW is idle and we are not causing a conflict. The caller is supposed to wait_for_idle before calling us to update the seqno, so let's assert they have and the engine is indeed idle. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170405153055.28123-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 31 3月, 2017 3 次提交
-
-
由 Chris Wilson 提交于
We can merge the pair of loops over the engines and their timelines into a single loop, making it easier to read and more consistent with the commentary. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-6-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
由 Chris Wilson 提交于
Having added the wait upon each engine to idle into the central i915_gem_wait_for_idle(), we can remove the now redundant wait from reset_all_global_seqno(). This has the advantage of removing the late detection of an error (an engine still busy) which left the seqno reset only partially complete (though it should be safe enough!). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-5-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
由 Chris Wilson 提交于
As we now distinguish everywhere that can call i915_gem_retire_requests() following a successful wait_for_idle, we can remove the duplication by moving that call into i915_gem_wait_for_idle() itself. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 30 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Michał Winiarski pointed out that the debugging infrastructure (such as trace_dma_fence_release) likes to pretty print the timeline name, long after we have freed the timeline. Our timelines currently live as part of the GTT (due to the strict ordering we currently use through each) which belong to the context. We aim to free the context and release its hardware resources as soon as we able to (i.e. when the last fence/request using it has been signaled and retired). As the .get_timeline_name is purely a debug feature, rather than extending the lifetime of the context, or splitting it into many different release phases just to keep the name around, replace the timeline name with a constant after the fence has been signaled. This avoids the potential use-after-free. Reported-by: NKrzysztof Olinski <krzysztof.e.olinski@intel.com> Fixes: 80b204bc ("drm/i915: Enable multiple timelines") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.10+ Link: http://patchwork.freedesktop.org/patch/msgid/20170330111614.29757-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
-
- 21 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Storing the position of the breadcrumb of the last retired request as a separate last_retired_head is superfluous as we always copy that into head prior to recalculation of the intel_ring.space. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170321102552.24357-1-chris@chris-wilson.co.uk
-
- 17 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
I915_RESET_IN_PROGRESS is being used for both signaling the requirement to i915_mutex_lock_interruptible() to avoid taking the struct_mutex and to instruct a waiter (already holding the struct_mutex) to perform the reset. To allow for a little more coordination, split these two meaning into a couple of distinct flags. I915_RESET_BACKOFF tells i915_mutex_lock_interruptible() not to acquire the mutex and I915_RESET_HANDOFF tells the waiter to call i915_reset(). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NMichel Thierry <michel.thierry@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170316171305.12972-1-chris@chris-wilson.co.uk
-
- 03 3月, 2017 2 次提交
-
-
由 Chris Wilson 提交于
During reset_all_global_seqno() on seqno rollover, we have to update the HWS. This causes all in flight requests to be completed, so first we wait. However, we were only waiting for the requests themselves to be completed and clearing out the waiter rbtrees - what I had missed was the extra reference in execlists->port[]. Since commit fe9ae7a3 ("drm/i915/execlists: Detect an out-of-order context switch") we can detect when the request is retired before the context switch interrupt is completed. The impact should be neglible outside of debugging. Testcase: igt/gem_exec_whisper Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170303121947.20482-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
-
由 Chris Wilson 提交于
Adding to the tail of the client request list as the only other user is in the throttle ioctl that iterates forwards over the list. It only needs protection against deletion of a request as it reads it, it simply won't see a new request added to the end of the list, or it would be too early and rejected. We can further reduce the number of spinlocks required when throttling by removing stale requests from the client_list as we throttle. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302122525.19675-1-chris@chris-wilson.co.uk
-
- 02 3月, 2017 4 次提交
-
-
由 Chris Wilson 提交于
assert_spin_locked() becomes an unconditionally compiled BUG_ON(), adding debug code right into the heart of critical routines like interrupt handlers. text data bss dec hex 1296480 19944 2272 1318696 141f28 before (lockdep disabled) 1295984 19944 2272 1318200 141d38 after 1336261 21139 3208 1360608 14c2e0 before (lockdep enabled) 1339920 21139 3208 1364267 14d12b after Small saving for release; hopefully more instructive in debug. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302132801.599-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
由 Chris Wilson 提交于
Everytime we take the fence->lock (aka request->lock), we must do so with irqs disabled since it may be used from within an hardirq context. As sometimes we are taking the lock in a nested manner, assert that the caller did disable the irqs for us. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302115130.28434-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
由 Ingo Molnar 提交于
Instead of including the full <linux/signal.h>, we are going to include the types-only <linux/signal_types.h> header in <linux/sched.h>, to further decouple the scheduler header from the signal headers. This means that various files which relied on the full <linux/signal.h> need to be updated to gain an explicit dependency on it. Update the code that relies on sched.h's inclusion of the <linux/signal.h> header. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which will have to be picked up from other headers and .c files. Create a trivial placeholder <linux/sched/clock.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 2月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
As execlists and other non-semaphore multi-engine devices coordinate between engines using interrupts, we can shave off a few 10s of microsecond of scheduling latency by doing the fence signaling from the interrupt as opposed to a RT kthread. (Realistically the delay adds about 1% to an individual cross-engine workload.) We only signal the first fence in order to limit the amount of work we move into the interrupt handler. We also have to remember that our breadcrumbs may be unordered with respect to the interrupt and so we still require the waiter process to perform some heavyweight coherency fixups, as well as traversing the tree of waiters. v2: No need for early exit in irq handler - it breaks the flow between patches and prevents the tracepoint v3: Restore rcu hold across irq signaling of request Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-2-chris@chris-wilson.co.uk
-
- 23 2月, 2017 12 次提交
-
-
由 Chris Wilson 提交于
Now that the code is getting simpler, we can reduce the indentation when waiting for the global_seqno. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-17-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
As we handoff the GPU reset to the waiter, we need to check we don't miss a wakeup if it has already been sent prior to us starting the wait. v2: Tweak checking for reset to be clear to the need before sleeping after changing the task state. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-16-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
由 Chris Wilson 提交于
Combine the common code for the pair of waiters into a single function. v2: Rename reset_request to wait_request_check_and_reset Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-15-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
由 Chris Wilson 提交于
If we change the wait_queue_t from using the autoremove_wake_function to the default_wake_function, we no longer have to restore the wait_queue_t entry on the wait_queue_head_t list after being woken up by it, as we are unusual in sleeping multiple times on the same wait_queue_t. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-14-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
After the request is cancelled, we then need to remove it from the global execution timeline and return it to the context timeline, the inverse of submit_request(). v2: Move manipulation of struct intel_wait to helpers Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-12-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
由 Chris Wilson 提交于
A request is assigned a global seqno only when it is on the hardware execution queue. The global seqno can be used to maintain a list of requests on the same engine in retirement order, for example for constructing a priority queue for waiting. Prior to its execution, or if it is subsequently removed in the event of preemption, its global seqno is zero. As both insertion and removal from the execution queue may operate in IRQ context, it is not guarded by the usual struct_mutex BKL. Instead those relying on the global seqno must be prepared for its value to change between reads. Only when the request is complete can the global seqno be stable (due to the memory barriers on submitting the commands to the hardware to write the breadcrumb, if the HWS shows that it has passed the global seqno and the global seqno is unchanged after the read, it is indeed complete). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
On reflection, we are only using the execute fence as a waitqueue on the global_seqno and not using it for dependency tracking between fences (unlike the submit and dma fences). By only treating it as a waitqueue, we can then treat it similar to the other waitqueues during submit, making the code simpler. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-8-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
It had only one callsite and existed to keep the code clearer. Now having shared the wait-on-error between phases and with plans to change the wait-for-execute in the next few patches, remove the out of line wait loop and move it into the main body of i915_wait_request. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-7-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Add ourselves to the gpu error waitqueue earlier on, even before we determine we have to wait on the seqno. This is so that we can then share the waitqueue between stages in subsequent patches. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-6-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Use a local variable to avoid having to type out the full name of the gpu_error wait_queue. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Move the companion functions next to each other. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Replace the global device seqno with one for each engine, and account for in-flight seqno on each separately. This is consistent with dma-fence as each timeline has separate fence-contexts for each engine and a seqno is only ordered within a fence-context (i.e. seqno do not need to be ordered wrt to other engines, just ordered within a single engine). This is required to enable request rewinding for preemption on individual engines (we have to rewind the global seqno to avoid overflow, and we do not have to rewind all engines just to preempt one.) v2: Rename active_seqno to inflight_seqnos to more clearly indicate that it is a counter and not equivalent to the existing seqno. Update functions that operated on active_seqno similarly. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-3-chris@chris-wilson.co.uk
-
- 21 2月, 2017 2 次提交
-
-
由 Tvrtko Ursulin 提交于
These new tracepoints are emitted once the request is ready to be submitted to the GPU and once the request is about to be submitted to the GPU, respectively. Former condition triggers as soon as all the fences and dependencies have been resolved, and the latter once the backend is about to submit it to the GPU. New tracepoint are enabled via the new DRM_I915_LOW_LEVEL_TRACEPOINTS Kconfig option which is disabled by default to alleviate the performance impact concerns. v2: Move execute tracepoint to __i915_gem_request_submit. (Chris Wilson) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> -
由 Tvrtko Ursulin 提交于
Provide the same information as the other request event classes. v2: Pass in flags so we can properly report the blocking status. (Chris Wilson) v3: Log hex with 0x prefix for clarity. v4: Derive blocking status from flags. (Chris Wilson) Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
-
- 17 2月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
If an interrupt has been posted, and we were spinning on the active seqno waiting for it to advance but it did not, then we can expect that it will not see its advance in the immediate future and should call into the irq-seqno barrier. We can stop spinning at this point, and leave the difficulty of handling the coherency to the caller. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-3-chris@chris-wilson.co.uk
-