1. 03 3月, 2017 2 次提交
  2. 28 2月, 2017 3 次提交
    • C
      drm/i915: Delay disabling the user interrupt for breadcrumbs · 67b807a8
      Chris Wilson 提交于
      A significant cost in setting up a wait is the overhead of enabling the
      interrupt. As we disable the interrupt whenever the queue of waiters is
      empty, if we are frequently waiting on alternating batches, we end up
      re-enabling the interrupt on a frequent basis. We do want to disable the
      interrupt during normal operations as under high load it may add several
      thousand interrupts/s - we have been known in the past to occupy whole
      cores with our interrupt handler after accidentally leaving user
      interrupts enabled. As a compromise, leave the interrupt enabled until
      the next IRQ, or the system is idle. This gives a small window for a
      waiter to keep the interrupt active and not be delayed by having to
      re-enable the interrupt.
      
      v2: Restore hangcheck/missed-irq detection for continuations
      v3: Be more careful restoring the hangcheck timer after reset
      v4: Be more careful restoring the fake irq after reset (if required!)
      v5: Redo changes to intel_engine_wakeup()
      v6: Factor out __intel_engine_wakeup()
      v7: Improve commentary for declaring a missed wakeup
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-4-chris@chris-wilson.co.uk
      67b807a8
    • C
      drm/i915: Signal first fence from irq handler if complete · 56299fb7
      Chris Wilson 提交于
      As execlists and other non-semaphore multi-engine devices coordinate
      between engines using interrupts, we can shave off a few 10s of
      microsecond of scheduling latency by doing the fence signaling from the
      interrupt as opposed to a RT kthread. (Realistically the delay adds
      about 1% to an individual cross-engine workload.) We only signal the
      first fence in order to limit the amount of work we move into the
      interrupt handler. We also have to remember that our breadcrumbs may be
      unordered with respect to the interrupt and so we still require the
      waiter process to perform some heavyweight coherency fixups, as well as
      traversing the tree of waiters.
      
      v2: No need for early exit in irq handler - it breaks the flow between
      patches and prevents the tracepoint
      v3: Restore rcu hold across irq signaling of request
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-2-chris@chris-wilson.co.uk
      56299fb7
    • C
      drm/i915: Report both waiters and success from intel_engine_wakeup() · 8d769ea7
      Chris Wilson 提交于
      The two users of the return value from intel_engine_wakeup() are
      expecting different results. In the breadcrumbs hangcheck, we are using
      it to determine whether wake_up_process() detected the waiter was
      currently running (and if so we presume that it hasn't yet missed the
      interrupt). However, in the fake_irq path, we are using the return value
      as a check as to whether there are any waiters, and so we may
      incorrectly stop the fake-irq if that waiter was currently running.
      
      To handle the two different needs, return both bits of information! We
      uninline it from the irq path in preparation for the next patch which
      makes the irq hotpath special and relegates intel_engine_wakeup() to the
      slow fixup paths.
      
      v2: s/ret/result/
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170227205850.2828-1-chris@chris-wilson.co.uk
      8d769ea7
  3. 23 2月, 2017 4 次提交
  4. 17 2月, 2017 3 次提交
  5. 15 2月, 2017 1 次提交
  6. 14 2月, 2017 2 次提交
  7. 11 2月, 2017 1 次提交
  8. 07 2月, 2017 2 次提交
  9. 30 1月, 2017 1 次提交
  10. 24 1月, 2017 3 次提交
  11. 23 1月, 2017 1 次提交
  12. 24 12月, 2016 1 次提交
  13. 19 12月, 2016 2 次提交
    • C
      drm/i915: Swap if(enable_execlists) in i915_gem_request_alloc for a vfunc · f73e7399
      Chris Wilson 提交于
      A fairly trivial move of a matching pair of routines (for preparing a
      request for construction) onto an engine vfunc. The ulterior motive is
      to be able to create a mock request implementation.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-7-chris@chris-wilson.co.uk
      f73e7399
    • C
      drm/i915: Unify active context tracking between legacy/execlists/guc · e8a9c58f
      Chris Wilson 提交于
      The requests conversion introduced a nasty bug where we could generate a
      new request in the middle of constructing a request if we needed to idle
      the system in order to evict space for a context. The request to idle
      would be executed (and waited upon) before the current one, creating a
      minor havoc in the seqno accounting, as we will consider the current
      request to already be completed (prior to deferred seqno assignment) but
      ring->last_retired_head would have been updated and still could allow
      us to overwrite the current request before execution.
      
      We also employed two different mechanisms to track the active context
      until it was switched out. The legacy method allowed for waiting upon an
      active context (it could forcibly evict any vma, including context's),
      but the execlists method took a step backwards by pinning the vma for
      the entire active lifespan of the context (the only way to evict was to
      idle the entire GPU, not individual contexts). However, to circumvent
      the tricky issue of locking (i.e. we cannot take struct_mutex at the
      time of i915_gem_request_submit(), where we would want to move the
      previous context onto the active tracker and unpin it), we take the
      execlists approach and keep the contexts pinned until retirement.
      The benefit of the execlists approach, more important for execlists than
      legacy, was the reduction in work in pinning the context for each
      request - as the context was kept pinned until idle, it could short
      circuit the pinning for all active contexts.
      
      We introduce new engine vfuncs to pin and unpin the context
      respectively. The context is pinned at the start of the request, and
      only unpinned when the following request is retired (this ensures that
      the context is idle and coherent in main memory before we unpin it). We
      move the engine->last_context tracking into the retirement itself
      (rather than during request submission) in order to allow the submission
      to be reordered or unwound without undue difficultly.
      
      And finally an ulterior motive for unifying context handling was to
      prepare for mock requests.
      
      v2: Rename to last_retired_context, split out legacy_context tracking
      for MI_SET_CONTEXT.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161218153724.8439-3-chris@chris-wilson.co.uk
      e8a9c58f
  14. 21 11月, 2016 2 次提交
    • M
      drm/i915: Decouple hang detection from hangcheck period · 3fe3b030
      Mika Kuoppala 提交于
      Hangcheck state accumulation has gained more steps
      along the years, like head movement and more recently the
      subunit inactivity check. As the subunit sampling is only
      done if the previous state check showed inactivity, we
      have added more stages (and time) to reach a hang verdict.
      
      Asymmetric engine states led to different actual weight of
      'one hangcheck unit' and it was demonstrated in some
      hangs that due to difference in stages, simpler engines
      were accused falsely of a hang as their scoring was much
      more quicker to accumulate above the hang treshold.
      
      To completely decouple the hangcheck guilty score
      from the hangcheck period, convert hangcheck score to a
      rough period of inactivity measurement. As these are
      tracked as jiffies, they are meaningful also across
      reset boundaries. This makes finding a guilty engine
      more accurate across multi engine activity scenarios,
      especially across asymmetric engines.
      
      We lose the ability to detect cross batch malicious attempts
      to hinder the progress. Plan is to move this functionality
      to be part of context banning which is more natural fit,
      later in the series.
      
      v2: use time_before macros (Chris)
          reinstate the pardoning of moving engine after hc (Chris)
      v3: avoid global state for per engine stall detection (Chris)
      v4: take timeline last retirement into account (Chris)
      v5: do debug print on pardoning, split out retirement timestamp (Chris)
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      3fe3b030
    • M
      drm/i915: Split up hangcheck phases · 6e16d028
      Mika Kuoppala 提交于
      In order to simplify hangcheck state keeping, split hangcheck
      per engine loop in three phases: state load, action, state save.
      
      Add few more hangcheck actions to separate between seqno, head
      and subunit movements. This helps to gather all the hangcheck
      actions under a single switch umbrella.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      6e16d028
  15. 15 11月, 2016 3 次提交
  16. 09 11月, 2016 1 次提交
  17. 01 11月, 2016 1 次提交
    • C
      drm/i915: Avoid accessing request->timeline outside of its lifetime · cb399eab
      Chris Wilson 提交于
      Whilst waiting on a request, we may do so without holding any locks or
      any guards beyond a reference to the request. In order to avoid taking
      locks within request deallocation, we drop references to its timeline
      (via the context and ppgtt) upon retirement. We should avoid chasing
      such pointers outside of their control, in particular we inspect the
      request->timeline to see if we may restore the RPS waitboost for a
      client. If we instead look at the engine->timeline, we will have similar
      behaviour on both full-ppgtt and !full-ppgtt systems and reduce the
      amount of reward we give towards stalling clients (i.e. only if the
      client stalls and the GPU is uncontended does it reclaim its boost).
      This restores behaviour back to pre-timelines, whilst fixing:
      
      [  645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0
      [  645.078577] Read of size 4 by task gem_exec_schedu/28408
      [  645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64
      [  645.078724] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
      [  645.078816]  ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200
      [  645.078998]  ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200
      [  645.079172]  ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796
      [  645.079345] Call Trace:
      [  645.079404]  [<ffffffff8143d059>] dump_stack+0x68/0x9f
      [  645.079467]  [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70
      [  645.079534]  [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0
      [  645.079601]  [<ffffffff8122a244>] kasan_report+0x34/0x40
      [  645.079676]  [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0
      [  645.079741]  [<ffffffff81229951>] __asan_load4+0x61/0x80
      [  645.079807]  [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0
      [  645.079876]  [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590
      [  645.079944]  [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500
      [  645.080016]  [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0
      [  645.080084]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
      [  645.080157]  [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0
      [  645.080226]  [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690
      [  645.080296]  [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690
      [  645.080366]  [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0
      [  645.080433]  [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
      [  645.080508]  [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0
      [  645.080603]  [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120
      [  645.080670]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
      [  645.080738]  [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
      [  645.080804]  [<ffffffff8120268c>] ? do_mmap+0x47c/0x580
      [  645.080871]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
      [  645.080938]  [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150
      [  645.081011]  [<ffffffff81108c53>] ? up_write+0x23/0x50
      [  645.081078]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
      [  645.081145]  [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90
      [  645.081214]  [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0
      [  645.082030]  [<ffffffff81288408>] ? __fget+0x168/0x250
      [  645.082106]  [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1
      [  645.082176]  [<ffffffff81288592>] ? __fget_light+0xa2/0xc0
      [  645.082242]  [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
      [  645.082309]  [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192
      [  645.082431] Allocated:
      [  645.082480] PID = 28408
      [  645.082535]  [  645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
      [  645.082623]  [  645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0
      [  645.082716]  [  645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0
      [  645.082817]  [  645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220
      [  645.082908]  [  645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560
      [  645.083027]  [  645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0
      [  645.083152]  [  645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
      [  645.083243]  [  645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
      [  645.083334]  [  645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
      [  645.083432]  [  645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  645.083551] Freed:
      [  645.083599] PID = 27629
      [  645.083648]  [  645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
      [  645.083738]  [  645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0
      [  645.083830]  [  645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0
      [  645.083922]  [  645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170
      [  645.084021]  [  645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180
      [  645.084139]  [  645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230
      [  645.084257]  [  645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230
      [  645.084380]  [  645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630
      [  645.084500]  [  645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280
      [  645.085313]  [  645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0
      [  645.085440]  [  645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920
      [  645.085532]  [  645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840
      [  645.085622]  [  645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0
      [  645.085718]  [  645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40
      [  645.085811] Memory state around the buggy address:
      [  645.085869]  ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.085956]  ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.086138]                                ^
      [  645.086193]  ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.086283]  ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      v2: Add a comment to document the hint like nature of
       intel_engine_last_submit()
      
      Fixes: 73cb9701 ("drm/i915: Combine seqno + tracking into a global timeline struct")
      Fixes: 80b204bc ("drm/i915: Enable multiple timelines")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-1-chris@chris-wilson.co.uk
      cb399eab
  18. 29 10月, 2016 7 次提交
    • C
      drm/i915: Enable multiple timelines · 80b204bc
      Chris Wilson 提交于
      With the infrastructure converted over to tracking multiple timelines in
      the GEM API whilst preserving the efficiency of using a single execution
      timeline internally, we can now assign a separate timeline to every
      context with full-ppgtt.
      
      v2: Add a comment to indicate the xfer between timelines upon submission.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk
      80b204bc
    • C
      drm/i915: Convert breadcrumbs spinlock to be irqsafe · f6168e33
      Chris Wilson 提交于
      The breadcrumbs are about to be used from within IRQ context sections
      (e.g. nouveau signals a fence from an interrupt handler causing us to
      submit a new request) and/or from bottom-half tasklets (i.e.
      intel_lrc_irq_handler), therefore we need to employ the irqsafe spinlock
      variants.
      
      For example, deferring the request submission to the
      intel_lrc_irq_handler generates this trace:
      
      [   66.388639] =================================
      [   66.388650] [ INFO: inconsistent lock state ]
      [   66.388663] 4.9.0-rc2+ #56 Not tainted
      [   66.388672] ---------------------------------
      [   66.388682] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [   66.388695] swapper/1/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
      [   66.388706]  (&(&b->lock)->rlock){+.?...} , at: [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150
      [   66.388761] {SOFTIRQ-ON-W} state was registered at:
      [   66.388772]   [   66.388783] [<ffffffff810bd842>] __lock_acquire+0x682/0x1870
      [   66.388795]   [   66.388803] [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0
      [   66.388814]   [   66.388824] [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40
      [   66.388835]   [   66.388845] [<ffffffff81401e41>] intel_engine_reset_breadcrumbs+0x21/0xb0
      [   66.388857]   [   66.388866] [<ffffffff81403ae7>] gen8_init_common_ring+0x67/0x100
      [   66.388878]   [   66.388887] [<ffffffff81403b92>] gen8_init_render_ring+0x12/0x60
      [   66.388903]   [   66.388912] [<ffffffff813f8707>] i915_gem_init_hw+0xf7/0x2a0
      [   66.388927]   [   66.388936] [<ffffffff813f899b>] i915_gem_init+0xbb/0xf0
      [   66.388950]   [   66.388959] [<ffffffff813b4980>] i915_driver_load+0x7e0/0x1330
      [   66.388978]   [   66.388988] [<ffffffff813c09d8>] i915_pci_probe+0x28/0x40
      [   66.389003]   [   66.389013] [<ffffffff812fa0db>] pci_device_probe+0x8b/0xf0
      [   66.389028]   [   66.389037] [<ffffffff8147737e>] driver_probe_device+0x21e/0x430
      [   66.389056]   [   66.389065] [<ffffffff8147766e>] __driver_attach+0xde/0xe0
      [   66.389080]   [   66.389090] [<ffffffff814751ad>] bus_for_each_dev+0x5d/0x90
      [   66.389105]   [   66.389113] [<ffffffff81477799>] driver_attach+0x19/0x20
      [   66.389134]   [   66.389144] [<ffffffff81475ced>] bus_add_driver+0x15d/0x260
      [   66.389159]   [   66.389168] [<ffffffff81477e3b>] driver_register+0x5b/0xd0
      [   66.389183]   [   66.389281] [<ffffffff812fa19b>] __pci_register_driver+0x5b/0x60
      [   66.389301]   [   66.389312] [<ffffffff81aed333>] i915_init+0x3e/0x45
      [   66.389326]   [   66.389336] [<ffffffff81ac2ffa>] do_one_initcall+0x8b/0x118
      [   66.389350]   [   66.389359] [<ffffffff81ac323a>] kernel_init_freeable+0x1b3/0x23b
      [   66.389378]   [   66.389387] [<ffffffff8160fc39>] kernel_init+0x9/0x100
      [   66.389402]   [   66.389411] [<ffffffff816180e7>] ret_from_fork+0x27/0x40
      [   66.389426] irq event stamp: 315865
      [   66.389438] hardirqs last  enabled at (315864): [<ffffffff816178f1>] _raw_spin_unlock_irqrestore+0x31/0x50
      [   66.389469] hardirqs last disabled at (315865): [<ffffffff816176b3>] _raw_spin_lock_irqsave+0x13/0x50
      [   66.389499] softirqs last  enabled at (315818): [<ffffffff8107a04c>] _local_bh_enable+0x1c/0x50
      [   66.389530] softirqs last disabled at (315819): [<ffffffff8107a50e>] irq_exit+0xbe/0xd0
      [   66.389559]
      [   66.389559] other info that might help us debug this:
      [   66.389580]  Possible unsafe locking scenario:
      [   66.389580]
      [   66.389598]        CPU0
      [   66.389609]        ----
      [   66.389620]   lock(&(&b->lock)->rlock);
      [   66.389650]   <Interrupt>
      [   66.389661]     lock(&(&b->lock)->rlock);
      [   66.389690]
      [   66.389690]  *** DEADLOCK ***
      [   66.389690]
      [   66.389715] 2 locks held by swapper/1/0:
      [   66.389728]  #0: (&(&tl->lock)->rlock){..-...}, at: [<ffffffff81403e01>] intel_lrc_irq_handler+0x201/0x3c0
      [   66.389785]  #1: (&(&req->lock)->rlock/1){..-...}, at: [<ffffffff813fc0af>] __i915_gem_request_submit+0x8f/0x170
      [   66.389854]
      [   66.389854] stack backtrace:
      [   66.389959] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc2+ #56
      [   66.389976] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
      [   66.389999]  ffff88027fd03c58 ffffffff812beae5 ffff88027696e680 ffffffff822afe20
      [   66.390036]  ffff88027fd03ca8 ffffffff810bb420 0000000000000001 0000000000000000
      [   66.390070]  0000000000000000 0000000000000006 0000000000000004 ffff88027696ee10
      [   66.390104] Call Trace:
      [   66.390117]  <IRQ>
      [   66.390128]  [<ffffffff812beae5>] dump_stack+0x68/0x93
      [   66.390147]  [<ffffffff810bb420>] print_usage_bug+0x1d0/0x1e0
      [   66.390164]  [<ffffffff810bb8a0>] mark_lock+0x470/0x4f0
      [   66.390181]  [<ffffffff810ba9d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0
      [   66.390203]  [<ffffffff810bd75d>] __lock_acquire+0x59d/0x1870
      [   66.390221]  [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0
      [   66.390237]  [<ffffffff810bedbc>] ? lock_acquire+0x6c/0xb0
      [   66.390255]  [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150
      [   66.390273]  [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40
      [   66.390291]  [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150
      [   66.390309]  [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150
      [   66.390327]  [<ffffffff813fc170>] __i915_gem_request_submit+0x150/0x170
      [   66.390345]  [<ffffffff81403e8b>] intel_lrc_irq_handler+0x28b/0x3c0
      [   66.390363]  [<ffffffff81079d97>] tasklet_action+0x57/0xc0
      [   66.390380]  [<ffffffff8107a249>] __do_softirq+0x119/0x240
      [   66.390396]  [<ffffffff8107a50e>] irq_exit+0xbe/0xd0
      [   66.390414]  [<ffffffff8101afd5>] do_IRQ+0x65/0x110
      [   66.390431]  [<ffffffff81618806>] common_interrupt+0x86/0x86
      [   66.390446]  <EOI>
      [   66.390457]  [<ffffffff814ec6d1>] ? cpuidle_enter_state+0x151/0x200
      [   66.390480]  [<ffffffff814ec7a2>] cpuidle_enter+0x12/0x20
      [   66.390498]  [<ffffffff810b639e>] call_cpuidle+0x1e/0x40
      [   66.390516]  [<ffffffff810b65ae>] cpu_startup_entry+0x10e/0x1f0
      [   66.390534]  [<ffffffff81036133>] start_secondary+0x103/0x130
      
      (This is split out of the defer global seqno allocation patch due to
      realisation that we need a more complete conversion if we want to defer
      request submission even further.)
      
      v2: lockdep was warning about mixed SOFTIRQ contexts not HARDIRQ
      contexts so we only need to use spin_lock_bh and not disable interrupts.
      
      v3: We need full irq protection as we may be called from a third party
      interrupt handler (via fences).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-32-chris@chris-wilson.co.uk
      f6168e33
    • C
      drm/i915: Move the global sync optimisation to the timeline · 85e17f59
      Chris Wilson 提交于
      Currently we try to reduce the number of synchronisations (now the
      number of requests we need to wait upon) by noting that if we have
      earlier waited upon a request, all subsequent requests in the timeline
      will be after the wait. This only applies to requests in this timeline,
      as other timelines will not be ordered by that waiter.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk
      85e17f59
    • C
      drm/i915: Defer breadcrumb emission · caddfe71
      Chris Wilson 提交于
      Move the actual emission of the breadcrumb for closing the request from
      i915_add_request() to the submit callback. (It can be moved later when
      required.) This allows us to defer the allocation of the global_seqno
      from request construction to actual submission, allowing us to emit the
      requests out of order (wrt to the order of their construction, they
      still will only be executed one all of their dependencies are resolved
      including that all earlier requests on their timeline have been
      submitted.) We have to specialise how we then emit the request in order
      to write into the preallocated space, rather than at the tail of the
      ringbuffer (which will have been advanced by the addition of new
      requests).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk
      caddfe71
    • C
      drm/i915: Record space required for breadcrumb emission · 98f29e8d
      Chris Wilson 提交于
      In the next patch, we will use deferred breadcrumb emission. That requires
      reserving sufficient space in the ringbuffer to emit the breadcrumb, which
      first requires us to know how large the breadcrumb is.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk
      98f29e8d
    • C
      drm/i915: Rename ->emit_request to ->emit_breadcrumb · 9b81d556
      Chris Wilson 提交于
      Now that the emission of the request tail and its submission to hardware
      are two separate steps, engine->emit_request() is confusing.
      engine->emit_request() is called to emit the breadcrumb commands for the
      request into the ring, name it such (engine->emit_breadcrumb).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk
      9b81d556
    • C
      drm/i915: Combine seqno + tracking into a global timeline struct · 73cb9701
      Chris Wilson 提交于
      Our timelines are more than just a seqno. They also provide an ordered
      list of requests to be executed. Due to the restriction of handling
      individual address spaces, we are limited to a timeline per address
      space but we use a fence context per engine within.
      
      Our first step to introducing independent timelines per context (i.e. to
      allow each context to have a queue of requests to execute that have a
      defined set of dependencies on other requests) is to provide a timeline
      abstraction for the global execution queue.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
      73cb9701