- 02 3月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Everytime we take the fence->lock (aka request->lock), we must do so with irqs disabled since it may be used from within an hardirq context. As sometimes we are taking the lock in a nested manner, assert that the caller did disable the irqs for us. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170302115130.28434-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
-
- 28 2月, 2017 5 次提交
-
-
由 Chris Wilson 提交于
Move the setting of gpu_error->missed_irq_ring bit to a common function so that we can get the debug logging for either path. v2: Add %pF caller Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170228085018.3225-1-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
A significant cost in setting up a wait is the overhead of enabling the interrupt. As we disable the interrupt whenever the queue of waiters is empty, if we are frequently waiting on alternating batches, we end up re-enabling the interrupt on a frequent basis. We do want to disable the interrupt during normal operations as under high load it may add several thousand interrupts/s - we have been known in the past to occupy whole cores with our interrupt handler after accidentally leaving user interrupts enabled. As a compromise, leave the interrupt enabled until the next IRQ, or the system is idle. This gives a small window for a waiter to keep the interrupt active and not be delayed by having to re-enable the interrupt. v2: Restore hangcheck/missed-irq detection for continuations v3: Be more careful restoring the hangcheck timer after reset v4: Be more careful restoring the fake irq after reset (if required!) v5: Redo changes to intel_engine_wakeup() v6: Factor out __intel_engine_wakeup() v7: Improve commentary for declaring a missed wakeup Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
By deferring hangcheck to the fake breadcrumb interrupt, we can simply the enabling procedure slightly - as by enabling the fake, we then enable the hangcheck. By always enabling the hangcheck from each fake interrupt (it will be a no-op for an already queued hangcheck), it will make restoring the breadcrumbs after a reset simpler in the next patch. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-3-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
As execlists and other non-semaphore multi-engine devices coordinate between engines using interrupts, we can shave off a few 10s of microsecond of scheduling latency by doing the fence signaling from the interrupt as opposed to a RT kthread. (Realistically the delay adds about 1% to an individual cross-engine workload.) We only signal the first fence in order to limit the amount of work we move into the interrupt handler. We also have to remember that our breadcrumbs may be unordered with respect to the interrupt and so we still require the waiter process to perform some heavyweight coherency fixups, as well as traversing the tree of waiters. v2: No need for early exit in irq handler - it breaks the flow between patches and prevents the tracepoint v3: Restore rcu hold across irq signaling of request Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-2-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The two users of the return value from intel_engine_wakeup() are expecting different results. In the breadcrumbs hangcheck, we are using it to determine whether wake_up_process() detected the waiter was currently running (and if so we presume that it hasn't yet missed the interrupt). However, in the fake_irq path, we are using the return value as a check as to whether there are any waiters, and so we may incorrectly stop the fake-irq if that waiter was currently running. To handle the two different needs, return both bits of information! We uninline it from the irq path in preparation for the next patch which makes the irq hotpath special and relegates intel_engine_wakeup() to the slow fixup paths. v2: s/ret/result/ Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-1-chris@chris-wilson.co.uk
-
- 23 2月, 2017 5 次提交
-
-
由 Chris Wilson 提交于
After the request is cancelled, we then need to remove it from the global execution timeline and return it to the context timeline, the inverse of submit_request(). v2: Move manipulation of struct intel_wait to helpers Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-12-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
由 Chris Wilson 提交于
If we preempt a request and remove it from the execution queue, we need to undo its global seqno and restart any waiters. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-11-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
The plan in the near-future is to allow requests to be removed from the signaler. We can no longer then rely on holding a reference to the request for the duration it is in the signaling tree, and instead must obtain a reference to the request for the current operation using RCU. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-10-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
A request is assigned a global seqno only when it is on the hardware execution queue. The global seqno can be used to maintain a list of requests on the same engine in retirement order, for example for constructing a priority queue for waiting. Prior to its execution, or if it is subsequently removed in the event of preemption, its global seqno is zero. As both insertion and removal from the execution queue may operate in IRQ context, it is not guarded by the usual struct_mutex BKL. Instead those relying on the global seqno must be prepared for its value to change between reads. Only when the request is complete can the global seqno be stable (due to the memory barriers on submitting the commands to the hardware to write the breadcrumb, if the HWS shows that it has passed the global seqno and the global seqno is unchanged after the read, it is indeed complete). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-9-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Replace the global device seqno with one for each engine, and account for in-flight seqno on each separately. This is consistent with dma-fence as each timeline has separate fence-contexts for each engine and a seqno is only ordered within a fence-context (i.e. seqno do not need to be ordered wrt to other engines, just ordered within a single engine). This is required to enable request rewinding for preemption on individual engines (we have to rewind the global seqno to avoid overflow, and we do not have to rewind all engines just to preempt one.) v2: Rename active_seqno to inflight_seqnos to more clearly indicate that it is a counter and not equivalent to the existing seqno. Update functions that operated on active_seqno similarly. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170223074422.4125-3-chris@chris-wilson.co.uk
-
- 17 2月, 2017 3 次提交
-
-
由 Chris Wilson 提交于
As a backup to waiting on a user-interrupt from the GPU, we use a heavy and frequent timer to wake up the waiting process should we detect an inconsistency whilst waiting. After seeing a "missed interrupt", the next time we wait, we restart the heavy timer. This patch is more reluctant to restart the timer and will only do so if we have not see any interrupts since when we started the fake irq timer. If we are seeing interrupts, then the waiters are being woken normally and we had an incoherency that caused to miss last time - that is unlikely to reoccur and so taking the risk of stalling again seems pragmatic. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-5-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
If the waiter was currently running, assume it hasn't had a chance to process the pending interrupt (e.g, low priority task on a loaded system) and wait until it sleeps before declaring a missed interrupt. References: https://bugs.freedesktop.org/show_bug.cgi?id=99816Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-4-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
When the timer expires for checking on interrupt processing, check to see if any interrupts arrived within the last time period. If real interrupts are still being delivered, we can be reassured that we haven't missed the final interrupt as the waiter will still be woken. Only once all activity ceases, do we have to worry about the waiter never being woken and so need to install a timer to kick the waiter for a slow arrival of a seqno. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170217151304.16665-2-chris@chris-wilson.co.uk
-
- 14 2月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
First retroactive test, make sure that the waiters are in global seqno order after random inserts and removals. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170213171558.20942-3-chris@chris-wilson.co.uk
-
- 13 2月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
The signal threads may be running concurrently with the GPU reset. The completion from the GPU run asynchronous with the reset and two threads may see different snapshots of the state, and the signaler may mark a request as complete as we try to reset it. We don't tolerate 2 different views of the same state and complain if we try to mark a request as failed if it is already complete. Disable the signal threads during reset to prevent this conflict (even though the conflict implies that the state we resetting to is invalid, we have already made our decision!). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99733 References: https://bugs.freedesktop.org/show_bug.cgi?id=99671Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170212172002.23072-4-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
-
- 25 1月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
When introduced, I thought that reducing client latency from the signaler was the priority. Since its inception the signaler has become responsible for keeping the execlists full, via the dma-fence. As this is very important to minimise overall execution time, signal the dma-fence first and then signal any waiting clients. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124110009.28947-8-chris@chris-wilson.co.uk
-
- 24 1月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
In the next patch, we will use the irq_posted technique for another engine interrupt, rather than use two members for the atomic updates, we can use two bits of one instead. First, we need to update the breadcrumbs to use the new common engine->irq_posted. v2: Use set_bit() rather than __set_bit() to ensure atomicity with respect to other bits in the mask Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170124151805.26146-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
-
- 23 1月, 2017 1 次提交
-
-
由 Chris Wilson 提交于
Ensure that the hangcheck is queued even in the absence of interrupts. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20170123093724.18592-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
-
- 20 12月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
In keeping with commit f802cf7e ("drm/i915/debugfs: use rb_entry()"), convert the primary user of the rbtrees over to using rb_entry rather than the equivalent container_of. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20161220104003.8044-1-chris@chris-wilson.co.ukReviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 21 11月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
When unloading the module, it is expected that we have finished executing all requests and so the signal threads should be idle. Add a warning in case there are any residual requests in the signaler rbtrees at that point. v2: We can also warn if there are any waiters Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161121110759.22896-1-chris@chris-wilson.co.uk
-
- 09 11月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
When we need to reset the global seqno on wraparound, we have to wait until the current rbtrees are drained (or otherwise the next waiter will be out of sequence). The current mechanism to kick and spin until complete, may exit too early as it would break if the target thread was currently running. Instead, we must wake up the threads, but keep spinning until the trees have been deleted. In order to appease Tvrtko, busy spin rather than yield(). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161108143719.32215-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
-
- 29 10月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
The breadcrumbs are about to be used from within IRQ context sections (e.g. nouveau signals a fence from an interrupt handler causing us to submit a new request) and/or from bottom-half tasklets (i.e. intel_lrc_irq_handler), therefore we need to employ the irqsafe spinlock variants. For example, deferring the request submission to the intel_lrc_irq_handler generates this trace: [ 66.388639] ================================= [ 66.388650] [ INFO: inconsistent lock state ] [ 66.388663] 4.9.0-rc2+ #56 Not tainted [ 66.388672] --------------------------------- [ 66.388682] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 66.388695] swapper/1/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 66.388706] (&(&b->lock)->rlock){+.?...} , at: [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150 [ 66.388761] {SOFTIRQ-ON-W} state was registered at: [ 66.388772] [ 66.388783] [<ffffffff810bd842>] __lock_acquire+0x682/0x1870 [ 66.388795] [ 66.388803] [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0 [ 66.388814] [ 66.388824] [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40 [ 66.388835] [ 66.388845] [<ffffffff81401e41>] intel_engine_reset_breadcrumbs+0x21/0xb0 [ 66.388857] [ 66.388866] [<ffffffff81403ae7>] gen8_init_common_ring+0x67/0x100 [ 66.388878] [ 66.388887] [<ffffffff81403b92>] gen8_init_render_ring+0x12/0x60 [ 66.388903] [ 66.388912] [<ffffffff813f8707>] i915_gem_init_hw+0xf7/0x2a0 [ 66.388927] [ 66.388936] [<ffffffff813f899b>] i915_gem_init+0xbb/0xf0 [ 66.388950] [ 66.388959] [<ffffffff813b4980>] i915_driver_load+0x7e0/0x1330 [ 66.388978] [ 66.388988] [<ffffffff813c09d8>] i915_pci_probe+0x28/0x40 [ 66.389003] [ 66.389013] [<ffffffff812fa0db>] pci_device_probe+0x8b/0xf0 [ 66.389028] [ 66.389037] [<ffffffff8147737e>] driver_probe_device+0x21e/0x430 [ 66.389056] [ 66.389065] [<ffffffff8147766e>] __driver_attach+0xde/0xe0 [ 66.389080] [ 66.389090] [<ffffffff814751ad>] bus_for_each_dev+0x5d/0x90 [ 66.389105] [ 66.389113] [<ffffffff81477799>] driver_attach+0x19/0x20 [ 66.389134] [ 66.389144] [<ffffffff81475ced>] bus_add_driver+0x15d/0x260 [ 66.389159] [ 66.389168] [<ffffffff81477e3b>] driver_register+0x5b/0xd0 [ 66.389183] [ 66.389281] [<ffffffff812fa19b>] __pci_register_driver+0x5b/0x60 [ 66.389301] [ 66.389312] [<ffffffff81aed333>] i915_init+0x3e/0x45 [ 66.389326] [ 66.389336] [<ffffffff81ac2ffa>] do_one_initcall+0x8b/0x118 [ 66.389350] [ 66.389359] [<ffffffff81ac323a>] kernel_init_freeable+0x1b3/0x23b [ 66.389378] [ 66.389387] [<ffffffff8160fc39>] kernel_init+0x9/0x100 [ 66.389402] [ 66.389411] [<ffffffff816180e7>] ret_from_fork+0x27/0x40 [ 66.389426] irq event stamp: 315865 [ 66.389438] hardirqs last enabled at (315864): [<ffffffff816178f1>] _raw_spin_unlock_irqrestore+0x31/0x50 [ 66.389469] hardirqs last disabled at (315865): [<ffffffff816176b3>] _raw_spin_lock_irqsave+0x13/0x50 [ 66.389499] softirqs last enabled at (315818): [<ffffffff8107a04c>] _local_bh_enable+0x1c/0x50 [ 66.389530] softirqs last disabled at (315819): [<ffffffff8107a50e>] irq_exit+0xbe/0xd0 [ 66.389559] [ 66.389559] other info that might help us debug this: [ 66.389580] Possible unsafe locking scenario: [ 66.389580] [ 66.389598] CPU0 [ 66.389609] ---- [ 66.389620] lock(&(&b->lock)->rlock); [ 66.389650] <Interrupt> [ 66.389661] lock(&(&b->lock)->rlock); [ 66.389690] [ 66.389690] *** DEADLOCK *** [ 66.389690] [ 66.389715] 2 locks held by swapper/1/0: [ 66.389728] #0: (&(&tl->lock)->rlock){..-...}, at: [<ffffffff81403e01>] intel_lrc_irq_handler+0x201/0x3c0 [ 66.389785] #1: (&(&req->lock)->rlock/1){..-...}, at: [<ffffffff813fc0af>] __i915_gem_request_submit+0x8f/0x170 [ 66.389854] [ 66.389854] stack backtrace: [ 66.389959] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc2+ #56 [ 66.389976] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [ 66.389999] ffff88027fd03c58 ffffffff812beae5 ffff88027696e680 ffffffff822afe20 [ 66.390036] ffff88027fd03ca8 ffffffff810bb420 0000000000000001 0000000000000000 [ 66.390070] 0000000000000000 0000000000000006 0000000000000004 ffff88027696ee10 [ 66.390104] Call Trace: [ 66.390117] <IRQ> [ 66.390128] [<ffffffff812beae5>] dump_stack+0x68/0x93 [ 66.390147] [<ffffffff810bb420>] print_usage_bug+0x1d0/0x1e0 [ 66.390164] [<ffffffff810bb8a0>] mark_lock+0x470/0x4f0 [ 66.390181] [<ffffffff810ba9d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0 [ 66.390203] [<ffffffff810bd75d>] __lock_acquire+0x59d/0x1870 [ 66.390221] [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0 [ 66.390237] [<ffffffff810bedbc>] ? lock_acquire+0x6c/0xb0 [ 66.390255] [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150 [ 66.390273] [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40 [ 66.390291] [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150 [ 66.390309] [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150 [ 66.390327] [<ffffffff813fc170>] __i915_gem_request_submit+0x150/0x170 [ 66.390345] [<ffffffff81403e8b>] intel_lrc_irq_handler+0x28b/0x3c0 [ 66.390363] [<ffffffff81079d97>] tasklet_action+0x57/0xc0 [ 66.390380] [<ffffffff8107a249>] __do_softirq+0x119/0x240 [ 66.390396] [<ffffffff8107a50e>] irq_exit+0xbe/0xd0 [ 66.390414] [<ffffffff8101afd5>] do_IRQ+0x65/0x110 [ 66.390431] [<ffffffff81618806>] common_interrupt+0x86/0x86 [ 66.390446] <EOI> [ 66.390457] [<ffffffff814ec6d1>] ? cpuidle_enter_state+0x151/0x200 [ 66.390480] [<ffffffff814ec7a2>] cpuidle_enter+0x12/0x20 [ 66.390498] [<ffffffff810b639e>] call_cpuidle+0x1e/0x40 [ 66.390516] [<ffffffff810b65ae>] cpu_startup_entry+0x10e/0x1f0 [ 66.390534] [<ffffffff81036133>] start_secondary+0x103/0x130 (This is split out of the defer global seqno allocation patch due to realisation that we need a more complete conversion if we want to defer request submission even further.) v2: lockdep was warning about mixed SOFTIRQ contexts not HARDIRQ contexts so we only need to use spin_lock_bh and not disable interrupts. v3: We need full irq protection as we may be called from a third party interrupt handler (via fences). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-32-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
-
- 25 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
I plan to usurp the short name of struct fence for a core kernel struct, and so I need to rename the specialised fence/timeline for DMA operations to make room. A consensus was reached in https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html that making clear this fence applies to DMA operations was a good thing. Since then the patch has grown a bit as usage increases, so hopefully it remains a good thing! (v2...: rebase, rerun spatch) v3: Compile on msm, spotted a manual fixup that I broke. v4: Try again for msm, sorry Daniel coccinelle script: @@ @@ - struct fence + struct dma_fence @@ @@ - struct fence_ops + struct dma_fence_ops @@ @@ - struct fence_cb + struct dma_fence_cb @@ @@ - struct fence_array + struct dma_fence_array @@ @@ - enum fence_flag_bits + enum dma_fence_flag_bits @@ @@ ( - fence_init + dma_fence_init | - fence_release + dma_fence_release | - fence_free + dma_fence_free | - fence_get + dma_fence_get | - fence_get_rcu + dma_fence_get_rcu | - fence_put + dma_fence_put | - fence_signal + dma_fence_signal | - fence_signal_locked + dma_fence_signal_locked | - fence_default_wait + dma_fence_default_wait | - fence_add_callback + dma_fence_add_callback | - fence_remove_callback + dma_fence_remove_callback | - fence_enable_sw_signaling + dma_fence_enable_sw_signaling | - fence_is_signaled_locked + dma_fence_is_signaled_locked | - fence_is_signaled + dma_fence_is_signaled | - fence_is_later + dma_fence_is_later | - fence_later + dma_fence_later | - fence_wait_timeout + dma_fence_wait_timeout | - fence_wait_any_timeout + dma_fence_wait_any_timeout | - fence_wait + dma_fence_wait | - fence_context_alloc + dma_fence_context_alloc | - fence_array_create + dma_fence_array_create | - to_fence_array + to_dma_fence_array | - fence_is_array + dma_fence_is_array | - trace_fence_emit + trace_dma_fence_emit | - FENCE_TRACE + DMA_FENCE_TRACE | - FENCE_WARN + DMA_FENCE_WARN | - FENCE_ERR + DMA_FENCE_ERR ) ( ... ) Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk> Acked-by: NSumit Semwal <sumit.semwal@linaro.org> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
-
- 14 10月, 2016 1 次提交
-
-
由 Akash Goel 提交于
With the possibility of addition of many more number of rings in future, the drm_i915_private structure could bloat as an array, of type intel_engine_cs, is embedded inside it. struct intel_engine_cs engine[I915_NUM_ENGINES]; Though this is still fine as generally there is only a single instance of drm_i915_private structure used, but not all of the possible rings would be enabled or active on most of the platforms. Some memory can be saved by allocating intel_engine_cs structure only for the enabled/active engines. Currently the engine/ring ID is kept static and dev_priv->engine[] is simply indexed using the enums defined in intel_engine_id. To save memory and continue using the static engine/ring IDs, 'engine' is defined as an array of pointers. struct intel_engine_cs *engine[I915_NUM_ENGINES]; dev_priv->engine[engine_ID] will be NULL for disabled engine instances. There is a text size reduction of 928 bytes, from 1028200 to 1027272, for i915.o file (but for i915.ko file text size remain same as 1193131 bytes). v2: - Remove the engine iterator field added in drm_i915_private structure, instead pass a local iterator variable to the for_each_engine** macros. (Chris) - Do away with intel_engine_initialized() and instead directly use the NULL pointer check on engine pointer. (Chris) v3: - Remove for_each_engine_id() macro, as the updated macro for_each_engine() can be used in place of it. (Chris) - Protect the access to Render engine Fault register with a NULL check, as engine specific init is done later in Driver load sequence. v4: - Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris) - Kill the superfluous init_engine_lists(). v5: - Cleanup the intel_engines_init() & intel_engines_setup(), with respect to allocation of intel_engine_cs structure. (Chris) v6: - Rebase. v7: - Optimize the for_each_engine_masked() macro. (Chris) - Change the type of 'iter' local variable to enum intel_engine_id. (Chris) - Rebase. v8: Rebase. v9: Rebase. v10: - For index calculation use engine ID instead of pointer based arithmetic in intel_engine_sync_index() as engine pointers are not contiguous now (Chris) - For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas) - Use for_each_engine macro for cleanup in intel_engines_init() and remove check for NULL engine pointer in cleanup() routines. (Joonas) v11: Rebase. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NAkash Goel <akash.goel@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
-
- 10 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Along with the interrupt, we want to restore the fake-irq and wait-timeout detection. If we use the breadcrumbs interface to setup the interrupt as it wants, the auxiliary timers will also be restored. v2: Cancel both timers as well, sanitize the IMR. Fixes: 821ed7df ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-3-chris@chris-wilson.co.uk (cherry picked from commit ad07dfcd) Signed-off-by: NJani Nikula <jani.nikula@intel.com>
-
- 07 10月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Along with the interrupt, we want to restore the fake-irq and wait-timeout detection. If we use the breadcrumbs interface to setup the interrupt as it wants, the auxiliary timers will also be restored. v2: Cancel both timers as well, sanitize the IMR. Fixes: 821ed7df ("drm/i915: Update reset path to fix incomplete requests") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161007065327.24515-3-chris@chris-wilson.co.uk
-
- 09 9月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Drive final request submission from a callback from the fence. This way the request is queued until all dependencies are resolved, at which point it is handed to the backend for queueing to hardware. At this point, no dependencies are set on the request, so the callback is immediate. A side-effect of imposing a heavier-irqsafe spinlock for execlist submission is that we lose the softirq enabling after scheduling the execlists tasklet. To compensate, we manually kickstart the softirq by disabling and enabling the bh around the fence signaling. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: NJohn Harrison <john.c.harrison@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20160909131201.16673-14-chris@chris-wilson.co.uk
-
- 10 8月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
The bottom-half we use for processing the breadcrumb interrupt is a task, which is an RCU protected struct. When accessing this struct, we need to be holding the RCU read lock to prevent it disappearing beneath us. We can use the RCU annotation to mark our irq_seqno_bh pointer as being under RCU guard and then use the RCU accessors to both provide correct ordering of access through the pointer. Most notably, this fixes the access from hard irq context to use the RCU read lock, which both Daniel and Tvrtko complained about. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-3-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
In commit 2529d570 ("drm/i915: Drop racy markup of missed-irqs from idle-worker") the racy detection of missed interrupts was removed when we went idle. This however opened up the issue that the stuck waiters were not being reported, causing a test case failure. If we move the stuck waiter detection out of hangcheck and into the breadcrumb mechanims (i.e. the waiter) itself, we can avoid this issue entirely. This leaves hangcheck looking for a stuck GPU (inspecting for request advancement and HEAD motion), and breadcrumbs looking for a stuck waiter - hopefully make both easier to understand by their segregation. v2: Reduce the error message as we now run independently of hangcheck, and the hanging batch used by igt also counts as a stuck waiter causing extra warnings in dmesg. v3: Move the breadcrumb's hangcheck kickstart to the first missed wait. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104 Fixes: 2529d570 (waiter"drm/i915: Drop racy markup of missed-irqs...") Testcase: igt/drv_missed_irq Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk
-
- 26 7月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Since intel_engine_enable_signaling() is now only called via fence_enable_sw_signaling(), we can rely on it to provide serialisation and run-once for us and so make ourselves slightly simpler. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-2-git-send-email-chris@chris-wilson.co.uk Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469530913-17180-1-git-send-email-chris@chris-wilson.co.uk
-
- 24 7月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
In order to close a race with a long running hangcheck comparing a stale interrupt counter with a just started waiter, we need to first bump the counter as we start the fresh wait. References: https://bugs.freedesktop.org/show_bug.cgi?id=96974Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1469351421-13493-2-git-send-email-chris@chris-wilson.co.uk
-
- 20 7月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
Now that we derive requests from struct fence, swap over to its nomenclature for references. It's shorter and more idiomatic across the kernel. s/i915_gem_request_reference/i915_gem_request_get/ s/i915_gem_request_unreference/i915_gem_request_put/ Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-2-git-send-email-chris@chris-wilson.co.uk Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-1-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
dma-buf provides a generic fence class for interoperation between drivers. Internally we use the request structure as a fence, and so with only a little bit of interfacing we can rebase those requests on top of dma-buf fences. This will allow us, in the future, to pass those fences back to userspace or between drivers. v2: The fence_context needs to be globally unique, not just unique to this device. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-4-git-send-email-chris@chris-wilson.co.uk
-
- 14 7月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Never go to sleep waiting on the GPU without first ensuring that we will get woken up. We have a choice of queuing the hangcheck before every schedule() or the first time we wakeup. In order to simply accommodate both the signaler and the ordinary waiter, move the queuing to the common point of enabling the irq. We lose the paranoid safety of ensuring that the hangcheck is active before the sleep, but avoid code duplication (and redundant hangcheck queuing). Testcase: igt/prime_busy Fixes: c81d4613 ("drm/i915: Convert trace-irq to the breadcrumb waiter") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com> (cherry picked from commit 232af392) Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 11 7月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
Never go to sleep waiting on the GPU without first ensuring that we will get woken up. We have a choice of queuing the hangcheck before every schedule() or the first time we wakeup. In order to simply accommodate both the signaler and the ordinary waiter, move the queuing to the common point of enabling the irq. We lose the paranoid safety of ensuring that the hangcheck is active before the sleep, but avoid code duplication (and redundant hangcheck queuing). Testcase: igt/prime_busy Fixes: c81d4613 ("drm/i915: Convert trace-irq to the breadcrumb waiter") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468055535-19740-2-git-send-email-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
-
- 06 7月, 2016 2 次提交
-
-
由 Chris Wilson 提交于
As we inspect both the tasklet (to check for an active bottom-half) and set the irq-posted flag at the same time (both in the interrupt handler and then in the bottom-halt), group those two together into the same cacheline. (Not having total control over placement of the struct means we can't guarantee the cacheline boundary, we need to align the kmalloc and then each struct, but the grouping should help.) v2: Try a couple of different names for the state touched by the user interrupt handler. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-3-git-send-email-chris@chris-wilson.co.uk
-
由 Chris Wilson 提交于
After assigning ourselves as the new bottom-half, we must perform a cursory check to prevent a missed interrupt. Either we miss the interrupt whilst programming the hardware, or if there was a previous waiter (for a later seqno) they may be woken instead of us (due to the inherent race in the unlocked read of b->tasklet in the irq handler) and so we miss the wake up. Spotted-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96806 Fixes: 688e6c72 ("drm/i915: Slaughter the thundering... herd") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-1-git-send-email-chris@chris-wilson.co.uk
-
- 02 7月, 2016 1 次提交
-
-
由 Chris Wilson 提交于
With only a single callsite for intel_engine_cs->irq_get and ->irq_put, we can reduce the code size by moving the common preamble into the caller, and we can also eliminate the reference counting. For completeness, as we are no longer doing reference counting on irq, rename the get/put vfunctions to enable/disable respectively and are able to review the use of posting reads. We only require the serialisation with hardware when enabling the interrupt (i.e. so we cannot miss an interrupt by going to sleep before the hardware truly enables it). Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-18-git-send-email-chris@chris-wilson.co.uk
-