1. 26 5月, 2017 1 次提交
  2. 25 5月, 2017 1 次提交
    • J
      drm/i915: Serialize GTT/Aperture accesses on BXT · 0ef34ad6
      Jon Bloomfield 提交于
      BXT has a H/W issue with IOMMU which can lead to system hangs when
      Aperture accesses are queued within the GAM behind GTT Accesses.
      
      This patch avoids the condition by wrapping all GTT updates in stop_machine
      and using a flushing read prior to restarting the machine.
      
      The stop_machine guarantees no new Aperture accesses can begin while
      the PTE writes are being emmitted. The flushing read ensures that
      any following Aperture accesses cannot begin until the PTE writes
      have been cleared out of the GAM's fifo.
      
      Only FOLLOWING Aperture accesses need to be separated from in flight
      PTE updates. PTE Writes may follow tightly behind already in flight
      Aperture accesses, so no flushing read is required at the start of
      a PTE update sequence.
      
      This issue was reproduced by running
      	igt/gem_readwrite and
      	igt/gem_render_copy
      simultaneously from different processes, each in a tight loop,
      with INTEL_IOMMU enabled.
      
      This patch was originally published as:
      	drm/i915: Serialize GTT Updates on BXT
      
      v2: Move bxt/iommu detection into static function
          Remove #ifdef CONFIG_INTEL_IOMMU protection
          Make function names more reflective of purpose
          Move flushing read into static function
      
      v3: Tidy up for checkpatch.pl
      
      Testcase: igt/gem_concurrent_blit
      Signed-off-by: NJon Bloomfield <jon.bloomfield@intel.com>
      Cc: John Harrison <john.C.Harrison@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1495641251-30022-1-git-send-email-jon.bloomfield@intel.comReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      0ef34ad6
  3. 24 5月, 2017 2 次提交
  4. 23 5月, 2017 2 次提交
  5. 22 5月, 2017 3 次提交
  6. 20 5月, 2017 1 次提交
  7. 19 5月, 2017 7 次提交
  8. 18 5月, 2017 13 次提交
  9. 17 5月, 2017 10 次提交
    • C
      drm/i915: Don't force serialisation on marking up execlists irq posted · 955a4b89
      Chris Wilson 提交于
      Since we coordinate with the execlists tasklet using a locked schedule
      operation that ensures that after we set the engine->irq_posted we
      always have an invocation of the tasklet, we do not need to use a locked
      operation to set the engine->irq_posted itself.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-12-chris@chris-wilson.co.uk
      955a4b89
    • C
      drm/i915: Stop inlining the execlists IRQ handler · 5d3d69d5
      Chris Wilson 提交于
      As the handler is now quite complex, involving a few atomics, the cost
      of the function preamble is negligible in comparison and so we should
      leave the function out-of-line for better I$.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-11-chris@chris-wilson.co.uk
      5d3d69d5
    • C
      drm/i915/execlists: Reduce lock contention between schedule/submit_request · 349bdb68
      Chris Wilson 提交于
      If we do not require to perform priority bumping, and we haven't yet
      submitted the request, we can update its priority in situ and skip
      acquiring the engine locks -- thus avoiding any contention between us
      and submit/execute.
      
      v2: Remove the stack element from the list if we can do the early
      assignment.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-10-chris@chris-wilson.co.uk
      349bdb68
    • C
      drm/i915: Create a kmem_cache to allocate struct i915_priolist from · c5cf9a91
      Chris Wilson 提交于
      The i915_priolist are allocated within an atomic context on a path where
      we wish to minimise latency. If we use a dedicated kmem_cache, we have
      the advantage of a local freelist from which to service new requests
      that should keep the latency impact of an allocation small. Though
      currently we expect the majority of requests to be at default priority
      (and so hit the preallocate priolist), once userspace starts using
      priorities they are likely to use many fine grained policies improving
      the utilisation of a private slab.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk
      c5cf9a91
    • C
      drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579
      Chris Wilson 提交于
      All the requests at the same priority are executed in FIFO order. They
      do not need to be stored in the rbtree themselves, as they are a simple
      list within a level. If we move the requests at one priority into a list,
      we can then reduce the rbtree to the set of priorities. This should keep
      the height of the rbtree small, as the number of active priorities can not
      exceed the number of active requests and should be typically only a few.
      
      Currently, we have ~2k possible different priority levels, that may
      increase to allow even more fine grained selection. Allocating those in
      advance seems a waste (and may be impossible), so we opt for allocating
      upon first use, and freeing after its requests are depleted. To avoid
      the possibility of an allocation failure causing us to lose a request,
      we preallocate the default priority (0) and bump any request to that
      priority if we fail to allocate it the appropriate plist. Having a
      request (that is ready to run, so not leading to corruption) execute
      out-of-order is better than leaking the request (and its dependency
      tree) entirely.
      
      There should be a benefit to reducing execlists_dequeue() to principally
      using a simple list (and reducing the frequency of both rbtree iteration
      and balancing on erase) but for typical workloads, request coalescing
      should be small enough that we don't notice any change. The main gain is
      from improving PI calls to schedule, and the explicit list within a
      level should make request unwinding simpler (we just need to insert at
      the head of the list rather than the tail and not have to make the
      rbtree search more complicated).
      
      v2: Avoid use-after-free when deleting a depleted priolist
      
      v3: Michał found the solution to handling the allocation failure
      gracefully. If we disable all priority scheduling following the
      allocation failure, those requests will be executed in fifo and we will
      ensure that this request and its dependencies are in strict fifo (even
      when it doesn't realise it is only a single list). Normal scheduling is
      restored once we know the device is idle, until the next failure!
      Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk
      6c067579
    • C
      drm/i915: Use a define for the default priority [0] · e4f815f6
      Chris Wilson 提交于
      Explicitly assign the default priority, and give it a name. After much
      discussion, we have chosen to call it I915_PRIORITY_NORMAL!
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-7-chris@chris-wilson.co.uk
      e4f815f6
    • C
      drm/i915: Don't mark an execlists context-switch when idle · a4b2b015
      Chris Wilson 提交于
      If we *know* that the engine is idle, i.e. we have not more contexts in
      flight, we can skip any spurious CSB idle interrupts. These spurious
      interrupts seem to arrive long after we assert that the engines are
      completely idle, triggering later assertions:
      
      [  178.896646] intel_engine_is_idle(bcs): interrupt not handled, irq_posted=2
      [  178.896655] ------------[ cut here ]------------
      [  178.896658] kernel BUG at drivers/gpu/drm/i915/intel_engine_cs.c:226!
      [  178.896661] invalid opcode: 0000 [#1] SMP
      [  178.896663] Modules linked in: i915(E) x86_pkg_temp_thermal(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) nls_ascii(E) nls_cp437(E) vfat(E) fat(E) intel_gtt(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) aesni_intel(E) prime_numbers(E) evdev(E) aes_x86_64(E) drm(E) crypto_simd(E) cryptd(E) glue_helper(E) mei_me(E) mei(E) lpc_ich(E) efivars(E) mfd_core(E) battery(E) video(E) acpi_pad(E) button(E) tpm_tis(E) tpm_tis_core(E) tpm(E) autofs4(E) i2c_i801(E) fan(E) thermal(E) i2c_designware_platform(E) i2c_designware_core(E)
      [  178.896694] CPU: 1 PID: 522 Comm: gem_exec_whispe Tainted: G            E   4.11.0-rc5+ #14
      [  178.896702] task: ffff88040aba8d40 task.stack: ffffc900003f0000
      [  178.896722] RIP: 0010:intel_engine_init_global_seqno+0x1db/0x1f0 [i915]
      [  178.896725] RSP: 0018:ffffc900003f3ab0 EFLAGS: 00010246
      [  178.896728] RAX: 0000000000000000 RBX: ffff88040af54000 RCX: 0000000000000000
      [  178.896731] RDX: ffff88041ec933e0 RSI: ffff88041ec8cc48 RDI: ffff88041ec8cc48
      [  178.896734] RBP: ffffc900003f3ac8 R08: 0000000000000000 R09: 000000000000047d
      [  178.896736] R10: 0000000000000040 R11: ffff88040b344f80 R12: 0000000000000000
      [  178.896739] R13: ffff88040bce0000 R14: ffff88040bce52d8 R15: ffff88040bce0000
      [  178.896742] FS:  00007f2cccc2d8c0(0000) GS:ffff88041ec80000(0000) knlGS:0000000000000000
      [  178.896746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  178.896749] CR2: 00007f41ddd8f000 CR3: 000000040bb03000 CR4: 00000000001406e0
      [  178.896752] Call Trace:
      [  178.896768]  reset_all_global_seqno.part.33+0x4e/0xd0 [i915]
      [  178.896782]  i915_gem_request_alloc+0x304/0x330 [i915]
      [  178.896795]  i915_gem_do_execbuffer+0x8a1/0x17d0 [i915]
      [  178.896799]  ? remove_wait_queue+0x48/0x50
      [  178.896812]  ? i915_wait_request+0x300/0x590 [i915]
      [  178.896816]  ? wake_up_q+0x70/0x70
      [  178.896819]  ? refcount_dec_and_test+0x11/0x20
      [  178.896823]  ? reservation_object_add_excl_fence+0xa5/0x100
      [  178.896835]  i915_gem_execbuffer2+0xab/0x1f0 [i915]
      [  178.896844]  drm_ioctl+0x1e6/0x460 [drm]
      [  178.896858]  ? i915_gem_execbuffer+0x260/0x260 [i915]
      [  178.896862]  ? dput+0xcf/0x250
      [  178.896866]  ? full_proxy_release+0x66/0x80
      [  178.896869]  ? mntput+0x1f/0x30
      [  178.896872]  do_vfs_ioctl+0x8f/0x5b0
      [  178.896875]  ? ____fput+0x9/0x10
      [  178.896878]  ? task_work_run+0x80/0xa0
      [  178.896881]  SyS_ioctl+0x3c/0x70
      [  178.896885]  entry_SYSCALL_64_fastpath+0x17/0x98
      [  178.896888] RIP: 0033:0x7f2ccb455ca7
      [  178.896890] RSP: 002b:00007ffcabec72d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  178.896894] RAX: ffffffffffffffda RBX: 000055f897a44b90 RCX: 00007f2ccb455ca7
      [  178.896897] RDX: 00007ffcabec74a0 RSI: 0000000040406469 RDI: 0000000000000003
      [  178.896900] RBP: 00007f2ccb70a440 R08: 00007f2ccb70d0a4 R09: 0000000000000000
      [  178.896903] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      [  178.896905] R13: 000055f89782d71a R14: 00007ffcabecf838 R15: 0000000000000003
      [  178.896908] Code: 00 31 d2 4c 89 ef 8d 70 48 41 ff 95 f8 06 00 00 e9 68 fe ff ff be 0f 00 00 00 48 c7 c7 48 dc 37 a0 e8 fa 33 d6 e0 e9 0b ff ff ff <0f> 0b 0f 0b 0f 0b 0f 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00
      
      On the other hand, by ignoring the interrupt do we risk running out of
      space in CSB ring? Testing for a few hours suggests not, i.e. that we
      only seem to get the odd delayed CSB idle notification.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-6-chris@chris-wilson.co.uk
      a4b2b015
    • C
      drm/i915/execlists: Pack the count into the low bits of the port.request · 77f0d0e9
      Chris Wilson 提交于
      add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
      function                                     old     new   delta
      execlists_submit_ports                       262     471    +209
      port_assign.isra                               -     136    +136
      capture                                     6344    6359     +15
      reset_common_ring                            438     452     +14
      execlists_submit_request                     228     238     +10
      gen8_init_common_ring                        334     341      +7
      intel_engine_is_idle                         106     105      -1
      i915_engine_info                            2314    2290     -24
      __i915_gem_set_wedged_BKL                    485     411     -74
      intel_lrc_irq_handler                       1789    1604    -185
      execlists_update_context                     294       -    -294
      
      The most important change there is the improve to the
      intel_lrc_irq_handler and excclist_submit_ports (net improvement since
      execlists_update_context is now inlined).
      
      v2: Use the port_api() for guc as well (even though currently we do not
      pack any counters in there, yet) and hide all port->request_count inside
      the helpers.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk
      77f0d0e9
    • C
      drm/i915: Redefine ptr_pack_bits() and friends · 0ce81788
      Chris Wilson 提交于
      Rebrand the current (pointer | bits) pack/unpack utility macros as
      explicit bit twiddling for PAGE_SIZE so that we can use the more
      flexible underlying macros for different bits.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-4-chris@chris-wilson.co.uk
      0ce81788
    • C
      drm/i915: Make ptr_unpack_bits() more function-like · 991bfc64
      Chris Wilson 提交于
      ptr_unpack_bits() is a function-like macro, as such it is meant to be
      replaceable by a function. In this case, we should be passing in the
      out-param as a pointer.
      
      Bizarrely this does affect code generation:
      
      function                                     old     new   delta
      i915_gem_object_pin_map                      409     389     -20
      
      An improvement(?) in this case, but one can't help wonder what
      strict-aliasing optimisations we are preventing.
      
      The generated code looks identical in using ptr_unpack_bits (no extra
      motions to stack, the pointer and bits appear to be kept in registers),
      the difference appears to be code ordering and with a reorder it is able
      to use smaller forward jumps.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-3-chris@chris-wilson.co.uk
      991bfc64