1. 10 11月, 2016 1 次提交
    • T
      drm/i915: Trim the object sg table · 0c40ce13
      Tvrtko Ursulin 提交于
      At the moment we allocate enough sg table entries assuming we
      will not be able to do any coalescing. But since in practice
      we most often can, and more so very effectively, this ends up
      wasting a lot of memory.
      
      A simple and effective way of trimming the over-allocated
      entries is to copy the table over to a new one allocated to the
      exact size.
      
      Experiments on my freshly logged and idle desktop (KDE) showed
      that by doing this we can save approximately 1 MiB of RAM, or
      when running a typical benchmark like gl_manhattan I have
      even seen a 6 MiB saving.
      
      More complicated techniques such as only copying the last used
      page and freeing the rest are left to the reader.
      
      v2:
       * Update commit message.
       * Use temporary sg_table on stack. (Chris Wilson)
      
      v3:
       * Commit message update.
       * Comment added.
       * Replace memcpy with copy assignment.
         (Chris Wilson)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1478704423-7447-1-git-send-email-tvrtko.ursulin@linux.intel.com
      0c40ce13
  2. 08 11月, 2016 1 次提交
  3. 07 11月, 2016 5 次提交
  4. 04 11月, 2016 1 次提交
  5. 03 11月, 2016 1 次提交
  6. 02 11月, 2016 2 次提交
  7. 01 11月, 2016 6 次提交
    • V
    • C
      drm/i915: Improve lockdep tracking for obj->mm.lock · 548625ee
      Chris Wilson 提交于
      The shrinker may appear to recurse into obj->mm.lock as the shrinker may
      be called from a direct reclaim path whilst handling get_pages. We
      filter out recursing on the same obj->mm.lock by inspecting
      obj->mm.pages, but we do want to take the lock on a second object in
      order to reap their pages. lockdep spots the recursion on the same
      lockclass and needs annotation to avoid a false positive. To keep the
      two paths distinct, create an enum to indicate which subclass of
      obj->mm.lock we are using. This removes the false positive and avoids
      masking real bugs.
      Suggested-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161101121134.27504-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      548625ee
    • C
      drm/i915: Store the vma in an rbtree under the object · db6c2b41
      Chris Wilson 提交于
      With full-ppgtt one of the main bottlenecks is the lookup of the VMA
      underneath the object. For execbuf there is merit in having a very fast
      direct lookup of ctx:handle to the vma using a hashtree, but that still
      leaves a large number of other lookups. One way to speed up the lookup
      would be to use a rhashtable, but that requires extra allocations and
      may exhibit poor worse case behaviour. An alternative is to use an
      embedded rbtree, i.e. no extra allocations and deterministic behaviour,
      but at the slight cost of O(lgN) lookups (instead of O(1) for
      rhashtable). The major of such tree will be very shallow and so not much
      slower, and still scales much, much better than the current unsorted
      list.
      
      v2: Bump vma_compare() to return a long, as we return the result of
      comparing two pointers.
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=87726Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161101115400.15647-1-chris@chris-wilson.co.uk
      db6c2b41
    • C
      drm/i915: Track pages pinned due to swizzling quirk · bc0629a7
      Chris Wilson 提交于
      If we have a tiled object and an unknown CPU swizzle pattern, we pin the
      pages to prevent the object from being swapped out (and us corrupting
      the contents as we do not know the access pattern and so cannot convert
      it to linear and back to tiled on reuse). This requires us to remember
      to drop the extra pinning when freeing the object, or else we trigger
      warnings about the pin leak. In commit fbbd37b3 ("drm/i915: Move
      object release to a freelist + worker"), the object free path was
      deferred to a worker, but the unpinning of the quirk, along with marking
      the object as reclaimable, was left on the immediate path (so that if
      required we could reclaim the pages under memory pressure as early as
      possible). However, this split introduced a bug where the pages were no
      longer being unpinned if they were marked as unneeded.
      
      [  231.800401] WARNING: CPU: 1 PID: 90 at drivers/gpu/drm/i915/i915_gem.c:4275 __i915_gem_free_objects+0x326/0x3c0 [i915]
      [  231.800403] WARN_ON(i915_gem_object_has_pinned_pages(obj))
      [  231.800405] Modules linked in:
      [  231.800406]  snd_hda_intel i915 snd_hda_codec_generic mei_me snd_hda_codec coretemp snd_hwdep mei lpc_ich snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: i915]
      [  231.800426] CPU: 1 PID: 90 Comm: kworker/1:4 Tainted: G     U          4.9.0-rc2-CI-CI_DRM_1780+ #1
      [  231.800428] Hardware name: LENOVO 7465CTO/7465CTO, BIOS 6DET44WW (2.08 ) 04/22/2009
      [  231.800456] Workqueue: events __i915_gem_free_work [i915]
      [  231.800459]  ffffc9000034fc80 ffffffff8142dd65 ffffc9000034fcd0 0000000000000000
      [  231.800465]  ffffc9000034fcc0 ffffffff8107e4e6 000010b300000001 0000000000001000
      [  231.800469]  ffff88011d3db740 ffff880130ef0000 0000000000000000 ffff880130ef5ea0
      [  231.800474] Call Trace:
      [  231.800479]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
      [  231.800484]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
      [  231.800487]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
      [  231.800491]  [<ffffffff811d12ac>] ? kmem_cache_free+0x2dc/0x340
      [  231.800520]  [<ffffffffa009ef36>] __i915_gem_free_objects+0x326/0x3c0 [i915]
      [  231.800548]  [<ffffffffa009effe>] __i915_gem_free_work+0x2e/0x50 [i915]
      [  231.800552]  [<ffffffff8109c27c>] process_one_work+0x1ec/0x6b0
      [  231.800555]  [<ffffffff8109c1f6>] ? process_one_work+0x166/0x6b0
      [  231.800558]  [<ffffffff8109c789>] worker_thread+0x49/0x490
      [  231.800561]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
      [  231.800563]  [<ffffffff8109c740>] ? process_one_work+0x6b0/0x6b0
      [  231.800566]  [<ffffffff810a2aab>] kthread+0xeb/0x110
      [  231.800569]  [<ffffffff810a29c0>] ? kthread_park+0x60/0x60
      [  231.800573]  [<ffffffff818164a7>] ret_from_fork+0x27/0x40
      
      Moving to a separate flag for tracking the quirked pin is overkill for
      the bug (since we only have to interchange the two tests in
      i915_gem_free_object) but it does reduce a complicated test on all
      objects and provide a sanitycheck for uncommon code paths.
      
      Fixes: fbbd37b3 ("drm/i915: Move object release to a freelist + worker")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-2-chris@chris-wilson.co.uk
      bc0629a7
    • C
      drm/i915: Avoid accessing request->timeline outside of its lifetime · cb399eab
      Chris Wilson 提交于
      Whilst waiting on a request, we may do so without holding any locks or
      any guards beyond a reference to the request. In order to avoid taking
      locks within request deallocation, we drop references to its timeline
      (via the context and ppgtt) upon retirement. We should avoid chasing
      such pointers outside of their control, in particular we inspect the
      request->timeline to see if we may restore the RPS waitboost for a
      client. If we instead look at the engine->timeline, we will have similar
      behaviour on both full-ppgtt and !full-ppgtt systems and reduce the
      amount of reward we give towards stalling clients (i.e. only if the
      client stalls and the GPU is uncontended does it reclaim its boost).
      This restores behaviour back to pre-timelines, whilst fixing:
      
      [  645.078485] BUG: KASAN: use-after-free in i915_gem_object_wait_fence+0x1ee/0x2e0 at addr ffff8802335643a0
      [  645.078577] Read of size 4 by task gem_exec_schedu/28408
      [  645.078638] CPU: 1 PID: 28408 Comm: gem_exec_schedu Not tainted 4.9.0-rc2+ #64
      [  645.078724] Hardware name:                  /        , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
      [  645.078816]  ffff88022daef9a0 ffffffff8143d059 ffff880235402a80 ffff880233564200
      [  645.078998]  ffff88022daef9c8 ffffffff81229c5c ffff88022daefa48 ffff880233564200
      [  645.079172]  ffff880235402a80 ffff88022daefa38 ffffffff81229ef0 000000008110a796
      [  645.079345] Call Trace:
      [  645.079404]  [<ffffffff8143d059>] dump_stack+0x68/0x9f
      [  645.079467]  [<ffffffff81229c5c>] kasan_object_err+0x1c/0x70
      [  645.079534]  [<ffffffff81229ef0>] kasan_report_error+0x1f0/0x4b0
      [  645.079601]  [<ffffffff8122a244>] kasan_report+0x34/0x40
      [  645.079676]  [<ffffffff81634f5e>] ? i915_gem_object_wait_fence+0x1ee/0x2e0
      [  645.079741]  [<ffffffff81229951>] __asan_load4+0x61/0x80
      [  645.079807]  [<ffffffff81634f5e>] i915_gem_object_wait_fence+0x1ee/0x2e0
      [  645.079876]  [<ffffffff816364bf>] i915_gem_object_wait+0x19f/0x590
      [  645.079944]  [<ffffffff81636320>] ? i915_gem_object_wait_priority+0x500/0x500
      [  645.080016]  [<ffffffff8110fb30>] ? debug_show_all_locks+0x1e0/0x1e0
      [  645.080084]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
      [  645.080157]  [<ffffffff8110a796>] ? __lock_is_held+0x46/0xc0
      [  645.080226]  [<ffffffff8163bc61>] ? i915_gem_set_domain_ioctl+0x141/0x690
      [  645.080296]  [<ffffffff8163bcc2>] i915_gem_set_domain_ioctl+0x1a2/0x690
      [  645.080366]  [<ffffffff811f8f85>] ? __might_fault+0x75/0xe0
      [  645.080433]  [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
      [  645.080508]  [<ffffffff8163bb20>] ? i915_gem_obj_prepare_shmem_write+0x3a0/0x3a0
      [  645.080603]  [<ffffffff815a52d0>] ? drm_ioctl_permit+0x120/0x120
      [  645.080670]  [<ffffffff8110abdc>] ? check_chain_key+0x14c/0x210
      [  645.080738]  [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
      [  645.080804]  [<ffffffff8120268c>] ? do_mmap+0x47c/0x580
      [  645.080871]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
      [  645.080938]  [<ffffffff812755f0>] ? ioctl_preallocate+0x150/0x150
      [  645.081011]  [<ffffffff81108c53>] ? up_write+0x23/0x50
      [  645.081078]  [<ffffffff811da567>] ? vm_mmap_pgoff+0x117/0x140
      [  645.081145]  [<ffffffff811da450>] ? vma_is_stack_for_current+0x90/0x90
      [  645.081214]  [<ffffffff8110d853>] ? mark_held_locks+0x23/0xc0
      [  645.082030]  [<ffffffff81288408>] ? __fget+0x168/0x250
      [  645.082106]  [<ffffffff819ad517>] ? entry_SYSCALL_64_fastpath+0x5/0xb1
      [  645.082176]  [<ffffffff81288592>] ? __fget_light+0xa2/0xc0
      [  645.082242]  [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
      [  645.082309]  [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  645.082374] Object at ffff880233564200, in cache kmalloc-8192 size: 8192
      [  645.082431] Allocated:
      [  645.082480] PID = 28408
      [  645.082535]  [  645.082566] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
      [  645.082623]  [  645.082656] [<ffffffff81228b06>] save_stack+0x46/0xd0
      [  645.082716]  [  645.082756] [<ffffffff812292fd>] kasan_kmalloc+0xad/0xe0
      [  645.082817]  [  645.082848] [<ffffffff81631752>] i915_ppgtt_create+0x52/0x220
      [  645.082908]  [  645.082941] [<ffffffff8161db96>] i915_gem_create_context+0x396/0x560
      [  645.083027]  [  645.083059] [<ffffffff8161f857>] i915_gem_context_create_ioctl+0x97/0xf0
      [  645.083152]  [  645.083183] [<ffffffff815a55f7>] drm_ioctl+0x327/0x640
      [  645.083243]  [  645.083274] [<ffffffff81275717>] do_vfs_ioctl+0x127/0xa20
      [  645.083334]  [  645.083372] [<ffffffff8127604c>] SyS_ioctl+0x3c/0x70
      [  645.083432]  [  645.083464] [<ffffffff819ad52e>] entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  645.083551] Freed:
      [  645.083599] PID = 27629
      [  645.083648]  [  645.083676] [<ffffffff8103ae66>] save_stack_trace+0x16/0x20
      [  645.083738]  [  645.083770] [<ffffffff81228b06>] save_stack+0x46/0xd0
      [  645.083830]  [  645.083862] [<ffffffff81229203>] kasan_slab_free+0x73/0xc0
      [  645.083922]  [  645.083961] [<ffffffff812279c9>] kfree+0xa9/0x170
      [  645.084021]  [  645.084053] [<ffffffff81629f60>] i915_ppgtt_release+0x100/0x180
      [  645.084139]  [  645.084171] [<ffffffff8161d414>] i915_gem_context_free+0x1b4/0x230
      [  645.084257]  [  645.084288] [<ffffffff816537b2>] intel_lr_context_unpin+0x192/0x230
      [  645.084380]  [  645.084413] [<ffffffff81645250>] i915_gem_request_retire+0x620/0x630
      [  645.084500]  [  645.085226] [<ffffffff816473d1>] i915_gem_retire_requests+0x181/0x280
      [  645.085313]  [  645.085352] [<ffffffff816352ba>] i915_gem_retire_work_handler+0xca/0xe0
      [  645.085440]  [  645.085471] [<ffffffff810c725b>] process_one_work+0x4fb/0x920
      [  645.085532]  [  645.085562] [<ffffffff810c770d>] worker_thread+0x8d/0x840
      [  645.085622]  [  645.085653] [<ffffffff810d21e5>] kthread+0x185/0x1b0
      [  645.085718]  [  645.085750] [<ffffffff819ad7a7>] ret_from_fork+0x27/0x40
      [  645.085811] Memory state around the buggy address:
      [  645.085869]  ffff880233564280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.085956]  ffff880233564300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.086053] >ffff880233564380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.086138]                                ^
      [  645.086193]  ffff880233564400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  645.086283]  ffff880233564480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      v2: Add a comment to document the hint like nature of
       intel_engine_last_submit()
      
      Fixes: 73cb9701 ("drm/i915: Combine seqno + tracking into a global timeline struct")
      Fixes: 80b204bc ("drm/i915: Enable multiple timelines")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161101100317.11129-1-chris@chris-wilson.co.uk
      cb399eab
    • C
      drm/i915: Use the full hammer when shutting down the rcu tasks · 7d5d59e5
      Chris Wilson 提交于
      To flush all call_rcu() tasks (here from i915_gem_free_object()) we need
      to call rcu_barrier() (not synchronize_rcu()). If we don't then we may
      still have objects being freed as we continue to teardown the driver -
      in particular, the recently released rings may race with the memory
      manager shutdown resulting in sporadic:
      
      [  142.217186] WARNING: CPU: 7 PID: 6185 at drivers/gpu/drm/drm_mm.c:932 drm_mm_takedown+0x2e/0x40
      [  142.217187] Memory manager not clean during takedown.
      [  142.217187] Modules linked in: i915(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel lpc_ich snd_hda_codec_realtek snd_hda_codec_generic mei_me mei snd_hda_codec_hdmi snd_hda_codec snd_hwdep snd_hda_core snd_pcm e1000e ptp pps_core [last unloaded: snd_hda_intel]
      [  142.217199] CPU: 7 PID: 6185 Comm: rmmod Not tainted 4.9.0-rc2-CI-Trybot_242+ #1
      [  142.217199] Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
      [  142.217200]  ffffc90002ecfce0 ffffffff8142dd65 ffffc90002ecfd30 0000000000000000
      [  142.217202]  ffffc90002ecfd20 ffffffff8107e4e6 000003a40778c2a8 ffff880401355c48
      [  142.217204]  ffff88040778c2a8 ffffffffa040f3c0 ffffffffa040f4a0 00005621fbf8b1f0
      [  142.217206] Call Trace:
      [  142.217209]  [<ffffffff8142dd65>] dump_stack+0x67/0x92
      [  142.217211]  [<ffffffff8107e4e6>] __warn+0xc6/0xe0
      [  142.217213]  [<ffffffff8107e54a>] warn_slowpath_fmt+0x4a/0x50
      [  142.217214]  [<ffffffff81559e3e>] drm_mm_takedown+0x2e/0x40
      [  142.217236]  [<ffffffffa035c02a>] i915_gem_cleanup_stolen+0x1a/0x20 [i915]
      [  142.217246]  [<ffffffffa034c581>] i915_ggtt_cleanup_hw+0x31/0xb0 [i915]
      [  142.217253]  [<ffffffffa0310311>] i915_driver_cleanup_hw+0x31/0x40 [i915]
      [  142.217260]  [<ffffffffa0312001>] i915_driver_unload+0x141/0x1a0 [i915]
      [  142.217268]  [<ffffffffa031c2c4>] i915_pci_remove+0x14/0x20 [i915]
      [  142.217269]  [<ffffffff8147d214>] pci_device_remove+0x34/0xb0
      [  142.217271]  [<ffffffff8157b14c>] __device_release_driver+0x9c/0x150
      [  142.217272]  [<ffffffff8157bcc6>] driver_detach+0xb6/0xc0
      [  142.217273]  [<ffffffff8157abe3>] bus_remove_driver+0x53/0xd0
      [  142.217274]  [<ffffffff8157c787>] driver_unregister+0x27/0x50
      [  142.217276]  [<ffffffff8147c265>] pci_unregister_driver+0x25/0x70
      [  142.217287]  [<ffffffffa03d764c>] i915_exit+0x1a/0x71 [i915]
      [  142.217289]  [<ffffffff811136b3>] SyS_delete_module+0x193/0x1e0
      [  142.217291]  [<ffffffff818174ae>] entry_SYSCALL_64_fastpath+0x1c/0xb1
      [  142.217292] ---[ end trace 6fd164859c154772 ]---
      [  142.217505] [drm:show_leaks] *ERROR* node [6b6b6b6b6b6b6b6b + 6b6b6b6b6b6b6b6b]: inserted at
                      [<ffffffff81559ff3>] save_stack.isra.1+0x53/0xa0
                      [<ffffffff8155a98d>] drm_mm_insert_node_in_range_generic+0x2ad/0x360
                      [<ffffffffa035bf23>] i915_gem_stolen_insert_node_in_range+0x93/0xe0 [i915]
                      [<ffffffffa035c855>] i915_gem_object_create_stolen+0x75/0xb0 [i915]
                      [<ffffffffa036a51a>] intel_engine_create_ring+0x9a/0x140 [i915]
                      [<ffffffffa036a921>] intel_init_ring_buffer+0xf1/0x440 [i915]
                      [<ffffffffa036be1b>] intel_init_render_ring_buffer+0xab/0x1b0 [i915]
                      [<ffffffffa0363d08>] intel_engines_init+0xc8/0x210 [i915]
                      [<ffffffffa0355d7c>] i915_gem_init+0xac/0xf0 [i915]
                      [<ffffffffa0311454>] i915_driver_load+0x9c4/0x1430 [i915]
                      [<ffffffffa031c2f8>] i915_pci_probe+0x28/0x40 [i915]
                      [<ffffffff8147d315>] pci_device_probe+0x85/0xf0
                      [<ffffffff8157b7ff>] driver_probe_device+0x21f/0x430
                      [<ffffffff8157baee>] __driver_attach+0xde/0xe0
      
      In particular note that the node was being poisoned as we inspected the
      list, a  clear indication that the object is being freed as we make the
      assertion.
      
      v2: Don't loop, just assert that we do all the work required as that
      will be better at detecting further errors.
      
      Fixes: fbbd37b3 ("drm/i915: Move object release to a freelist + worker")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161101084843.3961-1-chris@chris-wilson.co.uk
      7d5d59e5
  8. 29 10月, 2016 20 次提交
  9. 26 10月, 2016 2 次提交
  10. 25 10月, 2016 1 次提交