1. 29 8月, 2017 1 次提交
    • C
      drm/i915: Recreate vmapping even when the object is pinned · a575c676
      Chris Wilson 提交于
      Sometimes we know we are the only user of the bo, but since we take a
      protective pin_pages early on, an attempt to change the vmap on the
      object is denied because it is busy. i915_gem_object_pin_map() cannot
      tell from our single pin_count if the operation is safe. Instead we must
      pass that information down from the caller in the manner of
      I915_MAP_OVERRIDE.
      
      This issue has existed from the introduction of the mapping, but was
      never noticed as the only place where this conflict might happen is for
      cached kernel buffers (such as allocated by i915_gem_batch_pool_get()).
      Until recently there was only a single user (the cmdparser) so no
      conflicts ever occurred. However, we now use it to allocate batches for
      different operations (using MAP_WC on !llc for writes) in addition to the
      existing shadow batch (using MAP_WB for reads).
      
      We could either keep both mappings cached, or use a different write
      mechanism if we detect a MAP_WB already exists (i.e. clflush
      afterwards), but as we haven't seen this issue in the wild (it requires
      hitting the GPU reloc path in addition to the cmdparser) for simplicity
      just allow the mappings to be recreated.
      
      v2: Include the i915_MAP_OVERRIDE bit in the enum so the compiler knows
      about all the valid values.
      
      Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing")
      Testcase: igt/gem_lut_handle # byt, completely by accident
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170828104631.8606-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      a575c676
  2. 24 8月, 2017 1 次提交
  3. 18 8月, 2017 6 次提交
  4. 15 8月, 2017 2 次提交
  5. 27 7月, 2017 3 次提交
    • C
      drm/i915: Force CPU synchronisation even if userspace requests ASYNC · 0f46daa1
      Chris Wilson 提交于
      The goal here was to minimise doing any thing or any check inside the
      kernel that was not strictly required. For a userspace that assumes
      complete control over the cache domains, the kernel is usually using
      outdated information and may trigger clflushes where none were
      required.
      
      However, swapping is a situation where userspace has no knowledge of the
      domain transfer, and will leave the object in the CPU cache. The kernel
      must flush this out to the backing storage prior to use with the GPU. As
      we use an asynchronous task tracked by an implicit fence for this, we
      also need to cancel the ASYNC flag on the object so that the object will
      wait for the clflush to complete before being executed. This also absolves
      userspace of the responsibility imposed by commit 77ae9957 ("drm/i915:
      Enable userspace to opt-out of implicit fencing") that its needed to ensure
      that the object was out of the CPU cache prior to use on the GPU.
      
      Fixes: 77ae9957 ("drm/i915: Enable userspace to opt-out of implicit fencing")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101571Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: NJason Ekstrand <jason@jlekstrand.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-5-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0f46daa1
    • C
      drm/i915: Only skip updating execobject.offset after error · 1f727d9e
      Chris Wilson 提交于
      I was being overly paranoid in not updating the execobject.offset after
      performing the fallback copy where we set reloc.presumed_offset to -1.
      The thinking was to ensure that a subsequent NORELOC execbuf would be
      forced to process the invalid relocations. However this is overkill so
      long as we *only* update the execobject.offset following a successful
      update of the relocation value witin the batch. If we have to repeat the
      execbuf due to a later interruption, then we may skip the relocations on
      the second pass (honouring NORELOC) since the execobject.offset match
      the actual offsets (even though reloc.presumed_offset is garbage).
      
      Subsequent calls to execbuf with NORELOC should themselves ensure that
      the reloc.presumed_offset have been corrected in case of future
      migration.
      
      Reporting back the actual execobject.offset, even when
      reloc.presumed_offset is garbage, ensures that reuse of those objects
      use the latest information to avoid relocations.
      
      Fixes: 2889caa9 ("drm/i915: Eliminate lots of iterations over the execobjects array")
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101635Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-4-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      1f727d9e
    • C
      drm/i915: Only mark the execobject as pinned on success · 1da7b54c
      Chris Wilson 提交于
      If we fail to acquire a fence (for old school fenced GPU access) then we
      unwind the vma reservation, including its pin. However, we were making
      the execobject as holding the pin before erring out, leading to a double
      unpin:
      
      [ 3193.991802] kernel BUG at drivers/gpu/drm/i915/i915_vma.h:287!
      [ 3193.998131] invalid opcode: 0000 [#1] PREEMPT SMP
      [ 3194.002816] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_analog snd_hda_codec_generic coretemp snd_hda_codec snd_hwdep snd_hda_core snd_pcm lpc_ich mei_me e1000e mei prime_numbers ptp pps_core [last unloaded: i915]
      [ 3194.022841] CPU: 0 PID: 8123 Comm: kms_flip Tainted: G     U          4.13.0-rc1-CI-CI_DRM_471+ #1
      [ 3194.031765] Hardware name: Dell Inc. OptiPlex 755                 /0PU052, BIOS A04 11/05/2007
      [ 3194.040343] task: ffff8800785d4c40 task.stack: ffffc90001768000
      [ 3194.046339] RIP: 0010:eb_release_vmas.isra.6+0x119/0x180 [i915]
      [ 3194.052234] RSP: 0018:ffffc9000176ba80 EFLAGS: 00010246
      [ 3194.057439] RAX: 00000000000003c0 RBX: ffff8800710fc2d8 RCX: ffff8800588e4f48
      [ 3194.064546] RDX: ffffffff1fffffff RSI: 00000000ffffffff RDI: ffff8800588e00d0
      [ 3194.071654] RBP: ffffc9000176bab0 R08: 0000000000000000 R09: 0000000000000000
      [ 3194.078761] R10: 0000000000000040 R11: 0000000000000001 R12: ffff880060822f00
      [ 3194.085867] R13: 0000000000000310 R14: 00000000000003b8 R15: ffffc9000176bbb0
      [ 3194.092975] FS:  00007fd2b94aba40(0000) GS:ffff88007d200000(0000) knlGS:0000000000000000
      [ 3194.101033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 3194.106754] CR2: 00007ffbec3ff000 CR3: 0000000074e67000 CR4: 00000000000006f0
      [ 3194.113861] Call Trace:
      [ 3194.116321]  eb_relocate_slow+0x67/0x4e0 [i915]
      [ 3194.120861]  i915_gem_do_execbuffer+0x429/0x1260 [i915]
      [ 3194.126070]  ? lock_acquire+0xb5/0x210
      [ 3194.129803]  ? __might_fault+0x39/0x90
      [ 3194.133563]  i915_gem_execbuffer2+0x9b/0x1b0 [i915]
      [ 3194.138447]  ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
      [ 3194.143478]  drm_ioctl_kernel+0x64/0xb0
      [ 3194.147298]  drm_ioctl+0x2cd/0x390
      [ 3194.150710]  ? i915_gem_execbuffer+0x2b0/0x2b0 [i915]
      [ 3194.155741]  ? finish_task_switch+0xa5/0x210
      [ 3194.159993]  ? finish_task_switch+0x6a/0x210
      [ 3194.164247]  do_vfs_ioctl+0x90/0x670
      [ 3194.167806]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
      [ 3194.172492]  ? __this_cpu_preempt_check+0x13/0x20
      [ 3194.177176]  ? trace_hardirqs_on_caller+0xe7/0x1c0
      [ 3194.181946]  SyS_ioctl+0x3c/0x70
      [ 3194.185159]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [ 3194.189756] RIP: 0033:0x7fd2b76a8587
      [ 3194.193314] RSP: 002b:00007fff074845b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [ 3194.200855] RAX: ffffffffffffffda RBX: ffffffff8146da43 RCX: 00007fd2b76a8587
      [ 3194.207962] RDX: 00007fff074846e0 RSI: 0000000040406469 RDI: 0000000000000003
      [ 3194.215068] RBP: ffffc9000176bf88 R08: 0000000000000000 R09: 0000000000000003
      [ 3194.222175] R10: 00007fd2b796bb58 R11: 0000000000000246 R12: 00007fff07484880
      [ 3194.229280] R13: 0000000000000003 R14: 0000000040406469 R15: 0000000000000000
      [ 3194.236386]  ? __this_cpu_preempt_check+0x13/0x20
      [ 3194.241070] Code: 24 b0 00 00 00 48 85 c9 0f 84 6c ff ff ff 8b 41 20 85 c0 7e 73 83 e8 01 89 41 20 41 8b 84 24 e8 00 00 00 a8 0f 0f 85 5f ff ff ff <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d f3 c3 49 8b 84
      [ 3194.259943] RIP: eb_release_vmas.isra.6+0x119/0x180 [i915] RSP: ffffc9000176ba80
      [ 3194.268047] ---[ end trace 1d7348c6575d8800 ]---
      [ 3673.658819] softdog: Initiating panic
      [ 3673.662471] Kernel panic - not syncing: Software Watchdog Timer expired
      [ 3673.669066] Kernel Offset: disabled
      [ 3673.672541] Rebooting in 1 seconds..
      Reported-by: NTomi Sarvela <tomi.p.sarvela@intel.com>
      Fixes: 2889caa9 ("drm/i915: Eliminate lots of iterations over the execobjects array")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170721145037.25105-3-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      1da7b54c
  6. 17 7月, 2017 1 次提交
  7. 03 7月, 2017 1 次提交
  8. 29 6月, 2017 1 次提交
  9. 26 6月, 2017 3 次提交
    • C
      drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations · 611cdf36
      Chris Wilson 提交于
      If we write a relocation into the buffer, we require our own implicit
      synchronisation added after the start of the execbuf, outside of the
      user's control. As we may end up clflushing, or doing the patch itself
      on the GPU, asynchronously we need to look at the implicit serialisation
      on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
      object.
      
      If the user does trigger a stall for relocations, we make sure the stall
      is complete enough so that the batch is not submitted before we complete
      those relocations.
      
      Fixes: 77ae9957 ("drm/i915: Enable userspace to opt-out of implicit fencing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit 071750e5)
      [danvet: Resolve conflicts, resolution reviewed by Tvrtko on irc.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      611cdf36
    • C
      drm/i915: Clear execbuf's vma backpointer upon release · bdbbf7d6
      Chris Wilson 提交于
      commit 2889caa9 ("drm/i915: Eliminate lots of iterations over the
      execobjects array") jiggled around the error handling and replace a test
      that we cleaned up properly after ourselves with an assertion. That
      assertion failed because in the release function (moments after the
      assertion) we were indeed forgetting to mark the vma as cleared. The
      consequence was when testing an invalid relocation address, we would try
      to release the vma twice (following the couple of attempts to verify the
      address) and on the second release notice that the first release was
      incomplete.
      
      Testcase: igt/gem_reloc_overflow/invalid-address
      Fixes: 2889caa9 ("drm/i915: Eliminate lots of iterations over the execobjects array")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170622104722.2583-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      (cherry picked from commit 51d05e1b)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bdbbf7d6
    • C
      drm/i915: Pass the right flags to i915_vma_move_to_active() · b88eb199
      Chris Wilson 提交于
      i915_vma_move_to_active() takes the execobject flags and not a boolean!
      Instead of passing EXEC_OBJECT_WRITE we passed true [i.e.
      EXEC_OBJECT_NEEDS_FENCE] causing us to start tracking the
      vma->last_fence access and since we forgot to clear that on unbinding,
      we caused a use-after-free.
      
      [  321.263854] BUG: KASAN: use-after-free in i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264001] Read of size 8 at addr ffff880100fc67d8 by task gem_exec_reloc/2868
      
      [  321.264181] CPU: 0 PID: 2868 Comm: gem_exec_reloc Not tainted 4.12.0-rc6-CI-Custom_2759+ #1
      [  321.264195] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
      [  321.264208] Call Trace:
      [  321.264234]  dump_stack+0x67/0x99
      [  321.264260]  print_address_description+0x77/0x290
      [  321.264437]  ? i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264459]  kasan_report+0x269/0x350
      [  321.264487]  __asan_report_load8_noabort+0x14/0x20
      [  321.264660]  i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264841]  ? intel_ring_context_pin+0x131/0x690 [i915]
      [  321.265021]  i915_gem_request_alloc+0x2c6/0x1220 [i915]
      [  321.265044]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
      [  321.265226]  i915_gem_do_execbuffer+0xac0/0x2a20 [i915]
      [  321.265250]  ? __lock_acquire+0xceb/0x5450
      [  321.265269]  ? entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.265291]  ? kvmalloc_node+0x6b/0x80
      [  321.265310]  ? kvmalloc_node+0x6b/0x80
      [  321.265489]  ? eb_relocate_slow+0xbe0/0xbe0 [i915]
      [  321.265520]  ? ___slab_alloc.constprop.28+0x2ab/0x3d0
      [  321.265549]  ? debug_check_no_locks_freed+0x280/0x280
      [  321.265591]  ? __might_fault+0xc6/0x1b0
      [  321.265782]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.265815]  drm_ioctl+0x4ba/0xaa0
      [  321.265986]  ? i915_gem_execbuffer+0xde0/0xde0 [i915]
      [  321.266017]  ? drm_getunique+0x270/0x270
      [  321.266068]  do_vfs_ioctl+0x17f/0xfa0
      [  321.266091]  ? __fget+0x1ba/0x330
      [  321.266112]  ? lock_acquire+0x390/0x390
      [  321.266133]  ? ioctl_preallocate+0x1d0/0x1d0
      [  321.266164]  ? __fget+0x1db/0x330
      [  321.266194]  ? __fget_light+0x79/0x1f0
      [  321.266219]  SyS_ioctl+0x3c/0x70
      [  321.266247]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.266265] RIP: 0033:0x7fcede207357
      [  321.266279] RSP: 002b:00007ffef0effe58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  321.266307] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcede207357
      [  321.266321] RDX: 00007ffef0effef0 RSI: 0000000040406469 RDI: 0000000000000004
      [  321.266335] RBP: ffffffff812097c6 R08: 0000000000000008 R09: 0000000000000000
      [  321.266349] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880116bcff98
      [  321.266363] R13: ffffffff81cb7cb3 R14: ffff880116bcff70 R15: 0000000000000000
      [  321.266385]  ? __this_cpu_preempt_check+0x13/0x20
      [  321.266406]  ? trace_hardirqs_off_caller+0x1d6/0x2c0
      
      [  321.266487] Allocated by task 2868:
      [  321.266568]  save_stack_trace+0x16/0x20
      [  321.266586]  kasan_kmalloc+0xee/0x180
      [  321.266602]  kasan_slab_alloc+0x12/0x20
      [  321.266620]  kmem_cache_alloc+0xc7/0x2e0
      [  321.266795]  i915_vma_instance+0x28c/0x1540 [i915]
      [  321.266964]  eb_lookup_vmas+0x5a7/0x2250 [i915]
      [  321.267130]  i915_gem_do_execbuffer+0x69a/0x2a20 [i915]
      [  321.267296]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.267315]  drm_ioctl+0x4ba/0xaa0
      [  321.267333]  do_vfs_ioctl+0x17f/0xfa0
      [  321.267350]  SyS_ioctl+0x3c/0x70
      [  321.267369]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      [  321.267428] Freed by task 177:
      [  321.267502]  save_stack_trace+0x16/0x20
      [  321.267521]  kasan_slab_free+0xad/0x180
      [  321.267539]  kmem_cache_free+0xc5/0x340
      [  321.267710]  i915_vma_unbind+0x666/0x10a0 [i915]
      [  321.267880]  i915_vma_close+0x23a/0x2f0 [i915]
      [  321.268048]  __i915_gem_free_objects+0x17d/0xc70 [i915]
      [  321.268215]  __i915_gem_free_work+0x49/0x70 [i915]
      [  321.268234]  process_one_work+0x66f/0x1410
      [  321.268252]  worker_thread+0xe1/0xe90
      [  321.268269]  kthread+0x304/0x410
      [  321.268285]  ret_from_fork+0x27/0x40
      
      [  321.268346] The buggy address belongs to the object at ffff880100fc6640
                      which belongs to the cache i915_vma of size 656
      [  321.268550] The buggy address is located 408 bytes inside of
                      656-byte region [ffff880100fc6640, ffff880100fc68d0)
      [  321.268741] The buggy address belongs to the page:
      [  321.268837] page:ffffea000403f000 count:1 mapcount:0 mapping:          (null) index:0xffff880100fc5980 compound_mapcount: 0
      [  321.269045] flags: 0x8000000000008100(slab|head)
      [  321.269147] raw: 8000000000008100 0000000000000000 ffff880100fc5980 00000001001e001d
      [  321.269312] raw: ffffea0004038e20 ffff880116b46240 ffff88011646c640 0000000000000000
      [  321.269484] page dumped because: kasan: bad access detected
      
      [  321.269665] Memory state around the buggy address:
      [  321.269778]  ffff880100fc6680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.269949]  ffff880100fc6700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270115] >ffff880100fc6780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270279]                                                     ^
      [  321.270410]  ffff880100fc6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270576]  ffff880100fc6880: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc
      [  321.270740] ==================================================================
      [  321.270903] Disabling lock debugging due to kernel taint
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101511
      Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620124321.1108-2-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      (cherry picked from commit 25ffaa67)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b88eb199
  10. 22 6月, 2017 1 次提交
  11. 21 6月, 2017 2 次提交
    • C
      drm/i915: Pass the right flags to i915_vma_move_to_active() · 25ffaa67
      Chris Wilson 提交于
      i915_vma_move_to_active() takes the execobject flags and not a boolean!
      Instead of passing EXEC_OBJECT_WRITE we passed true [i.e.
      EXEC_OBJECT_NEEDS_FENCE] causing us to start tracking the
      vma->last_fence access and since we forgot to clear that on unbinding,
      we caused a use-after-free.
      
      [  321.263854] BUG: KASAN: use-after-free in i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264001] Read of size 8 at addr ffff880100fc67d8 by task gem_exec_reloc/2868
      
      [  321.264181] CPU: 0 PID: 2868 Comm: gem_exec_reloc Not tainted 4.12.0-rc6-CI-Custom_2759+ #1
      [  321.264195] Hardware name: GIGABYTE GB-BXBT-1900/MZBAYAB-00, BIOS F6 02/17/2015
      [  321.264208] Call Trace:
      [  321.264234]  dump_stack+0x67/0x99
      [  321.264260]  print_address_description+0x77/0x290
      [  321.264437]  ? i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264459]  kasan_report+0x269/0x350
      [  321.264487]  __asan_report_load8_noabort+0x14/0x20
      [  321.264660]  i915_gem_request_retire+0x1728/0x1740 [i915]
      [  321.264841]  ? intel_ring_context_pin+0x131/0x690 [i915]
      [  321.265021]  i915_gem_request_alloc+0x2c6/0x1220 [i915]
      [  321.265044]  ? _raw_spin_unlock_irqrestore+0x3d/0x60
      [  321.265226]  i915_gem_do_execbuffer+0xac0/0x2a20 [i915]
      [  321.265250]  ? __lock_acquire+0xceb/0x5450
      [  321.265269]  ? entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.265291]  ? kvmalloc_node+0x6b/0x80
      [  321.265310]  ? kvmalloc_node+0x6b/0x80
      [  321.265489]  ? eb_relocate_slow+0xbe0/0xbe0 [i915]
      [  321.265520]  ? ___slab_alloc.constprop.28+0x2ab/0x3d0
      [  321.265549]  ? debug_check_no_locks_freed+0x280/0x280
      [  321.265591]  ? __might_fault+0xc6/0x1b0
      [  321.265782]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.265815]  drm_ioctl+0x4ba/0xaa0
      [  321.265986]  ? i915_gem_execbuffer+0xde0/0xde0 [i915]
      [  321.266017]  ? drm_getunique+0x270/0x270
      [  321.266068]  do_vfs_ioctl+0x17f/0xfa0
      [  321.266091]  ? __fget+0x1ba/0x330
      [  321.266112]  ? lock_acquire+0x390/0x390
      [  321.266133]  ? ioctl_preallocate+0x1d0/0x1d0
      [  321.266164]  ? __fget+0x1db/0x330
      [  321.266194]  ? __fget_light+0x79/0x1f0
      [  321.266219]  SyS_ioctl+0x3c/0x70
      [  321.266247]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  321.266265] RIP: 0033:0x7fcede207357
      [  321.266279] RSP: 002b:00007ffef0effe58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [  321.266307] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fcede207357
      [  321.266321] RDX: 00007ffef0effef0 RSI: 0000000040406469 RDI: 0000000000000004
      [  321.266335] RBP: ffffffff812097c6 R08: 0000000000000008 R09: 0000000000000000
      [  321.266349] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880116bcff98
      [  321.266363] R13: ffffffff81cb7cb3 R14: ffff880116bcff70 R15: 0000000000000000
      [  321.266385]  ? __this_cpu_preempt_check+0x13/0x20
      [  321.266406]  ? trace_hardirqs_off_caller+0x1d6/0x2c0
      
      [  321.266487] Allocated by task 2868:
      [  321.266568]  save_stack_trace+0x16/0x20
      [  321.266586]  kasan_kmalloc+0xee/0x180
      [  321.266602]  kasan_slab_alloc+0x12/0x20
      [  321.266620]  kmem_cache_alloc+0xc7/0x2e0
      [  321.266795]  i915_vma_instance+0x28c/0x1540 [i915]
      [  321.266964]  eb_lookup_vmas+0x5a7/0x2250 [i915]
      [  321.267130]  i915_gem_do_execbuffer+0x69a/0x2a20 [i915]
      [  321.267296]  i915_gem_execbuffer2+0x14a/0x3f0 [i915]
      [  321.267315]  drm_ioctl+0x4ba/0xaa0
      [  321.267333]  do_vfs_ioctl+0x17f/0xfa0
      [  321.267350]  SyS_ioctl+0x3c/0x70
      [  321.267369]  entry_SYSCALL_64_fastpath+0x1c/0xb1
      
      [  321.267428] Freed by task 177:
      [  321.267502]  save_stack_trace+0x16/0x20
      [  321.267521]  kasan_slab_free+0xad/0x180
      [  321.267539]  kmem_cache_free+0xc5/0x340
      [  321.267710]  i915_vma_unbind+0x666/0x10a0 [i915]
      [  321.267880]  i915_vma_close+0x23a/0x2f0 [i915]
      [  321.268048]  __i915_gem_free_objects+0x17d/0xc70 [i915]
      [  321.268215]  __i915_gem_free_work+0x49/0x70 [i915]
      [  321.268234]  process_one_work+0x66f/0x1410
      [  321.268252]  worker_thread+0xe1/0xe90
      [  321.268269]  kthread+0x304/0x410
      [  321.268285]  ret_from_fork+0x27/0x40
      
      [  321.268346] The buggy address belongs to the object at ffff880100fc6640
                      which belongs to the cache i915_vma of size 656
      [  321.268550] The buggy address is located 408 bytes inside of
                      656-byte region [ffff880100fc6640, ffff880100fc68d0)
      [  321.268741] The buggy address belongs to the page:
      [  321.268837] page:ffffea000403f000 count:1 mapcount:0 mapping:          (null) index:0xffff880100fc5980 compound_mapcount: 0
      [  321.269045] flags: 0x8000000000008100(slab|head)
      [  321.269147] raw: 8000000000008100 0000000000000000 ffff880100fc5980 00000001001e001d
      [  321.269312] raw: ffffea0004038e20 ffff880116b46240 ffff88011646c640 0000000000000000
      [  321.269484] page dumped because: kasan: bad access detected
      
      [  321.269665] Memory state around the buggy address:
      [  321.269778]  ffff880100fc6680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.269949]  ffff880100fc6700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270115] >ffff880100fc6780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270279]                                                     ^
      [  321.270410]  ffff880100fc6800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  321.270576]  ffff880100fc6880: fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc
      [  321.270740] ==================================================================
      [  321.270903] Disabling lock debugging due to kernel taint
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101511
      Fixes: 7dd4f672 ("drm/i915: Async GPU relocation processing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620124321.1108-2-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      25ffaa67
    • C
      drm/i915: Enable rcu-only context lookups · 1acfc104
      Chris Wilson 提交于
      Whilst the contents of the context is still protected by the big
      struct_mutex, this is not much of an improvement. It is just one tiny
      step towards reducing our BKL.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-3-chris@chris-wilson.co.uk
      1acfc104
  12. 16 6月, 2017 12 次提交
    • C
      drm/i915: Stash a pointer to the obj's resv in the vma · 95ff7c7d
      Chris Wilson 提交于
      During execbuf, a mandatory step is that we add this request (this
      fence) to each object's reservation_object. Inside execbuf, we track the
      vma, and to add the fence to the reservation_object then means having to
      first chase the obj, incurring another cache miss. We can reduce the
       number of cache misses by stashing a pointer to the reservation_object
      in the vma itself.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170616140525.6394-1-chris@chris-wilson.co.uk
      95ff7c7d
    • C
      drm/i915: Async GPU relocation processing · 7dd4f672
      Chris Wilson 提交于
      If the user requires patching of their batch or auxiliary buffers, we
      currently make the alterations on the cpu. If they are active on the GPU
      at the time, we wait under the struct_mutex for them to finish executing
      before we rewrite the contents. This happens if shared relocation trees
      are used between different contexts with separate address space (and the
      buffers then have different addresses in each), the 3D state will need
      to be adjusted between execution on each context. However, we don't need
      to use the CPU to do the relocation patching, as we could queue commands
      to the GPU to perform it and use fences to serialise the operation with
      the current activity and future - so the operation on the GPU appears
      just as atomic as performing it immediately. Performing the relocation
      rewrites on the GPU is not free, in terms of pure throughput, the number
      of relocations/s is about halved - but more importantly so is the time
      under the struct_mutex.
      
      v2: Break out the request/batch allocation for clearer error flow.
      v3: A few asserts to ensure rq ordering is maintained
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      7dd4f672
    • C
      drm/i915: Allow execbuffer to use the first object as the batch · 1a71cf2f
      Chris Wilson 提交于
      Currently, the last object in the execlist is the always the batch.
      However, when building the batch buffer we often know the batch object
      first and if we can use the first slot in the execlist we can emit
      relocation instructions relative to it immediately and avoid a separate
      pass to adjust the relocations to point to the last execlist slot.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      1a71cf2f
    • C
      drm/i915: Wait upon userptr get-user-pages within execbuffer · 8a2421bd
      Chris Wilson 提交于
      This simply hides the EAGAIN caused by userptr when userspace causes
      resource contention. However, it is quite beneficial with highly
      contended userptr users as we avoid repeating the setup costs and
      kernel-user context switches.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      8a2421bd
    • C
      drm/i915: First try the previous execbuffer location · 616d9cee
      Chris Wilson 提交于
      When choosing a slot for an execbuffer, we ideally want to use the same
      address as last time (so that we don't have to rebind it) and the same
      address as expected by the user (so that we don't have to fixup any
      relocations pointing to it). If we first try to bind the incoming
      execbuffer->offset from the user, or the currently bound offset that
      should hopefully achieve the goal of avoiding the rebind cost and the
      relocation penalty. However, if the object is not currently bound there
      we don't want to arbitrarily unbind an object in our chosen position and
      so choose to rebind/relocate the incoming object instead. After we
      report the new position back to the user, on the next pass the
      relocations should have settled down.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtien@linux.intel.com>
      616d9cee
    • C
      drm/i915: Store a persistent reference for an object in the execbuffer cache · dade2a61
      Chris Wilson 提交于
      If we take a reference to the object/vma when it is first used in an
      execbuf, we can keep that reference until the object's file-local handle
      is closed. Thereby saving a frequent ref/unref pair.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      dade2a61
    • C
      drm/i915: Eliminate lots of iterations over the execobjects array · 2889caa9
      Chris Wilson 提交于
      The major scaling bottleneck in execbuffer is the processing of the
      execobjects. Creating an auxiliary list is inefficient when compared to
      using the execobject array we already have allocated.
      
      Reservation is then split into phases. As we lookup up the VMA, we
      try and bind it back into active location. Only if that fails, do we add
      it to the unbound list for phase 2. In phase 2, we try and add all those
      objects that could not fit into their previous location, with fallback
      to retrying all objects and evicting the VM in case of severe
      fragmentation. (This is the same as before, except that phase 1 is now
      done inline with looking up the VMA to avoid an iteration over the
      execobject array. In the ideal case, we eliminate the separate reservation
      phase). During the reservation phase, we only evict from the VM between
      passes (rather than currently as we try to fit every new VMA). In
      testing with Unreal Engine's Atlantis demo which stresses the eviction
      logic on gen7 class hardware, this speed up the framerate by a factor of
      2.
      
      The second loop amalgamation is between move_to_gpu and move_to_active.
      As we always submit the request, even if incomplete, we can use the
      current request to track active VMA as we perform the flushes and
      synchronisation required.
      
      The next big advancement is to avoid copying back to the user any
      execobjects and relocations that are not changed.
      
      v2: Add a Theory of Operation spiel.
      v3: Fall back to slow relocations in preparation for flushing userptrs.
      v4: Document struct members, factor out eb_validate_vma(), add a few
      more comments to explain some magic and hide other magic behind macros.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      2889caa9
    • C
      drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations · 071750e5
      Chris Wilson 提交于
      If we write a relocation into the buffer, we require our own implicit
      synchronisation added after the start of the execbuf, outside of the
      user's control. As we may end up clflushing, or doing the patch itself
      on the GPU, asynchronously we need to look at the implicit serialisation
      on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
      object.
      
      If the user does trigger a stall for relocations, we make sure the stall
      is complete enough so that the batch is not submitted before we complete
      those relocations.
      
      Fixes: 77ae9957 ("drm/i915: Enable userspace to opt-out of implicit fencing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      071750e5
    • C
      drm/i915: Pass vma to relocate entry · 507d977f
      Chris Wilson 提交于
      We can simplify our tracking of pending writes in an execbuf to the
      single bit in the vma->exec_entry->flags, but that requires the
      relocation function knowing the object's vma. Pass it along.
      
      Note we have only been using a single bit to track flushing since
      
      commit cc889e0f
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Jun 13 20:45:19 2012 +0200
      
          drm/i915: disable flushing_list/gpu_write_list
      
      unconditionally flushed all render caches before the breadcrumb and
      
      commit 6ac42f41
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Sat Jul 21 12:25:01 2012 +0200
      
          drm/i915: Replace the complex flushing logic with simple invalidate/flush all
      
      did away with the explicit GPU domain tracking. This was then codified
      into the ABI with NO_RELOC in
      
      commit ed5982e6
      Author: Daniel Vetter <daniel.vetter@ffwll.ch> # Oi! Patch stealer!
      Date:   Thu Jan 17 22:23:36 2013 +0100
      
          drm/i915: Allow userspace to hint that the relocations were known
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      507d977f
    • C
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson 提交于
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c
    • C
      drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty · 7fc92e96
      Chris Wilson 提交于
      For ease of use (i.e. avoiding a few checks and function calls), store
      the object's cache coherency next to the cache is dirty bit.
      
      Specifically this patch aims to reduce the frequency of no-op calls to
      i915_gem_object_clflush() to counter-act the increase of such calls for
      GPU only objects in the previous patch.
      
      v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
      !cache_coherent as gcc generates much better code for the latter
      (Tvrtko)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dongwon Kim <dongwon.kim@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Tested-by: NDongwon Kim <dongwon.kim@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      7fc92e96
    • C
      drm/i915: Mark CPU cache as dirty on every transition for CPU writes · e27ab73d
      Chris Wilson 提交于
      Currently, we only mark the CPU cache as dirty if we skip a clflush.
      This leads to some confusion where we have to ask if the object is in
      the write domain or missed a clflush. If we always mark the cache as
      dirty, this becomes a much simply question to answer.
      
      The goal remains to do as few clflushes as required and to do them as
      late as possible, in the hope of deferring the work to a kthread and not
      block the caller (e.g. execbuf, flips).
      
      v2: Always call clflush before GPU execution when the cache_dirty flag
      is set. This may cause some extra work on llc systems that migrate dirty
      buffers back and forth - but we do try to limit that by only setting
      cache_dirty at the end of the gpu sequence.
      
      v3: Always mark the cache as dirty upon a level change, as we need to
      invalidate any stale cachelines due to external writes.
      Reported-by: NDongwon Kim <dongwon.kim@intel.com>
      Fixes: a6a7cc4b ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dongwon Kim <dongwon.kim@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Tested-by: NDongwon Kim <dongwon.kim@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      e27ab73d
  13. 15 6月, 2017 3 次提交
  14. 18 5月, 2017 1 次提交
  15. 15 4月, 2017 1 次提交
  16. 29 3月, 2017 1 次提交