1. 05 9月, 2017 1 次提交
    • V
      drm/i915: Add __rcu to radix tree slot pointer · c23aa71b
      Ville Syrjälä 提交于
      radix_tree_for_each_slot() wants an __rcu annotated pointer for the
      slot. So let's add the annotation.
      
      Fixes the following sparse warnings:
      i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
      i915_gem.c:2217:9:    expected void **slot
      i915_gem.c:2217:9:    got void [noderef] <asn:4>**
      i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
      i915_gem.c:2217:9:    expected void **slot
      i915_gem.c:2217:9:    got void [noderef] <asn:4>**
      i915_gem.c:2217:9: warning: incorrect type in argument 1 (different address spaces)
      i915_gem.c:2217:9:    expected void [noderef] <asn:4>**slot
      i915_gem.c:2217:9:    got void **slot
      i915_gem.c:2217:9: warning: incorrect type in assignment (different address spaces)
      i915_gem.c:2217:9:    expected void **slot
      i915_gem.c:2217:9:    got void [noderef] <asn:4>**
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Fixes: 96d77634 ("drm/i915: Use a radixtree for random access to the object's backing storage")
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170901171252.31025-1-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      c23aa71b
  2. 30 8月, 2017 1 次提交
  3. 29 8月, 2017 4 次提交
  4. 24 8月, 2017 3 次提交
  5. 18 8月, 2017 1 次提交
    • C
      drm/i915: Replace execbuf vma ht with an idr · d1b48c1e
      Chris Wilson 提交于
      This was the competing idea long ago, but it was only with the rewrite
      of the idr as an radixtree and using the radixtree directly ourselves,
      along with the realisation that we can store the vma directly in the
      radixtree and only need a list for the reverse mapping, that made the
      patch performant enough to displace using a hashtable. Though the vma ht
      is fast and doesn't require any extra allocation (as we can embed the node
      inside the vma), it does require a thread for resizing and serialization
      and will have the occasional slow lookup. That is hairy enough to
      investigate alternatives and favour them if equivalent in peak performance.
      One advantage of allocating an indirection entry is that we can support a
      single shared bo between many clients, something that was done on a
      first-come first-serve basis for shared GGTT vma previously. To offset
      the extra allocations, we create yet another kmem_cache for them.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20170816085210.4199-5-chris@chris-wilson.co.uk
      d1b48c1e
  6. 15 8月, 2017 2 次提交
  7. 27 7月, 2017 9 次提交
  8. 21 7月, 2017 1 次提交
  9. 13 7月, 2017 1 次提交
    • M
      drm/i915: use __GFP_RETRY_MAYFAIL · dbb32956
      Michal Hocko 提交于
      Commit 24f8e00a ("drm/i915: Prefer to report ENOMEM rather than
      incur the oom for gfx allocations") has tried to remove disruptive OOM
      killer because the userspace should be able to cope with allocation
      failures.
      
      At the time only __GFP_NORETRY could achieve that and it turned out that
      this would fail the allocations just too easily.  So "drm/i915: Remove
      __GFP_NORETRY from our buffer allocator" removed it and hoped for a
      better solution.  __GFP_RETRY_MAYFAIL is that solution.  It will keep
      retrying the allocation until there is no more progress and we would go
      OOM.  Instead we fail the allocation and let the caller to deal with it.
      
      Link: http://lkml.kernel.org/r/20170623085345.11304-6-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dbb32956
  10. 28 6月, 2017 2 次提交
  11. 23 6月, 2017 1 次提交
    • C
      drm/i915: Break modeset deadlocks on reset · 36703e79
      Chris Wilson 提交于
      Trying to do a modeset from within a reset is fraught with danger. We
      can fall into a cyclic deadlock where the modeset is waiting on a
      previous modeset that is waiting on a request, and since the GPU hung
      that request completion is waiting on the reset. As modesetting doesn't
      allow its locks to be broken and restarted, or for its *own* reset
      mechanism to take over the display, we have to do something very
      evil instead. If we detect that we are stuck waiting to prepare the
      display reset (by using a very simple timeout), resort to cancelling all
      in-flight requests and throwing the user data into /dev/null, which is
      marginally better than the driver locking up and keeping that data to
      itself.
      
      This is not a fix; this is just a workaround that unbreaks machines
      until we can resolve the deadlock in a way that doesn't lose data!
      
      v2: Move the retirement from set-wegded to the i915_reset() error path,
      after which we no longer any delayed worker cleanup for
      i915_handle_error()
      v3: C abuse for syntactic sugar
      v4: Cover all waits with the timeout to catch more driver breakage
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=99093Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170622105625.16952-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      36703e79
  12. 21 6月, 2017 4 次提交
  13. 19 6月, 2017 2 次提交
    • C
      drm/i915: Remove __GFP_NORETRY from our buffer allocator · ce2c5872
      Chris Wilson 提交于
      I tried __GFP_NORETRY in the belief that __GFP_RECLAIM was effective. It
      struggles with handling reclaim of our dirty buffers and relies on
      reclaim via kswapd. As a result, a single pass of direct reclaim is
      unreliable when i915 occupies the majority of available memory, and the
      only means of effectively waiting on kswapd to amke progress is by not
      setting the __GFP_NORETRY flag and lopping. That leaves us with the
      dilemma of invoking the oomkiller instead of propagating the allocation
      failure back to userspace where it can be handled more gracefully (one
      hopes).  In the future we may have __GFP_MAYFAIL to allow repeats up until
      we genuinely run out of memory and the oomkiller would have been invoked.
      Until then, let the oomkiller wreck havoc.
      
      v2: Stop playing with side-effects of gfp flags and await __GFP_MAYFAIL
      v3: Update comments that direct reclaim only appears to be ignoring our
      dirty buffers!
      
      Fixes: 24f8e00a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
      Testcase: igt/gem_tiled_swapping
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Michal Hocko <mhocko@suse.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-2-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      (cherry picked from commit eaf41801)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      ce2c5872
    • C
      drm/i915: Encourage our shrinker more when our shmemfs allocations fails · b8d5a9cc
      Chris Wilson 提交于
      Commit 24f8e00a ("drm/i915: Prefer to report ENOMEM rather than
      incur the oom for gfx allocations") made the bold decision to try and
      avoid the oomkiller by reporting -ENOMEM to userspace if our allocation
      failed after attempting to free enough buffer objects. In short, it
      appears we were giving up too easily (even before we start wondering if
      one pass of reclaim is as strong as we would like). Part of the problem
      is that if we only shrink just enough pages for our expected allocation,
      the likelihood of those pages becoming available to us is less than 100%
      To counter-act that we ask for twice the number of pages to be made
      available. Furthermore, we allow the shrinker to pull pages from the
      active list in later passes.
      
      v2: Be a little more cautious in paging out gfx buffers, and leave that
      to a more balanced approach from shrink_slab(). Important when combined
      with "drm/i915: Start writeback from the shrinker" as anything shrunk is
      immediately swapped out and so should be more conservative.
      
      Fixes: 24f8e00a ("drm/i915: Prefer to report ENOMEM rather than incur the oom for gfx allocations")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170609110350.1767-1-chris@chris-wilson.co.uk
      (cherry picked from commit 4846bf0c)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      b8d5a9cc
  14. 16 6月, 2017 5 次提交
    • C
      drm/i915: Async GPU relocation processing · 7dd4f672
      Chris Wilson 提交于
      If the user requires patching of their batch or auxiliary buffers, we
      currently make the alterations on the cpu. If they are active on the GPU
      at the time, we wait under the struct_mutex for them to finish executing
      before we rewrite the contents. This happens if shared relocation trees
      are used between different contexts with separate address space (and the
      buffers then have different addresses in each), the 3D state will need
      to be adjusted between execution on each context. However, we don't need
      to use the CPU to do the relocation patching, as we could queue commands
      to the GPU to perform it and use fences to serialise the operation with
      the current activity and future - so the operation on the GPU appears
      just as atomic as performing it immediately. Performing the relocation
      rewrites on the GPU is not free, in terms of pure throughput, the number
      of relocations/s is about halved - but more importantly so is the time
      under the struct_mutex.
      
      v2: Break out the request/batch allocation for clearer error flow.
      v3: A few asserts to ensure rq ordering is maintained
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      7dd4f672
    • C
      drm/i915: Wait upon userptr get-user-pages within execbuffer · 8a2421bd
      Chris Wilson 提交于
      This simply hides the EAGAIN caused by userptr when userspace causes
      resource contention. However, it is quite beneficial with highly
      contended userptr users as we avoid repeating the setup costs and
      kernel-user context switches.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      8a2421bd
    • C
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson 提交于
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c
    • C
      drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty · 7fc92e96
      Chris Wilson 提交于
      For ease of use (i.e. avoiding a few checks and function calls), store
      the object's cache coherency next to the cache is dirty bit.
      
      Specifically this patch aims to reduce the frequency of no-op calls to
      i915_gem_object_clflush() to counter-act the increase of such calls for
      GPU only objects in the previous patch.
      
      v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
      !cache_coherent as gcc generates much better code for the latter
      (Tvrtko)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dongwon Kim <dongwon.kim@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Tested-by: NDongwon Kim <dongwon.kim@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      7fc92e96
    • C
      drm/i915: Mark CPU cache as dirty on every transition for CPU writes · e27ab73d
      Chris Wilson 提交于
      Currently, we only mark the CPU cache as dirty if we skip a clflush.
      This leads to some confusion where we have to ask if the object is in
      the write domain or missed a clflush. If we always mark the cache as
      dirty, this becomes a much simply question to answer.
      
      The goal remains to do as few clflushes as required and to do them as
      late as possible, in the hope of deferring the work to a kthread and not
      block the caller (e.g. execbuf, flips).
      
      v2: Always call clflush before GPU execution when the cache_dirty flag
      is set. This may cause some extra work on llc systems that migrate dirty
      buffers back and forth - but we do try to limit that by only setting
      cache_dirty at the end of the gpu sequence.
      
      v3: Always mark the cache as dirty upon a level change, as we need to
      invalidate any stale cachelines due to external writes.
      Reported-by: NDongwon Kim <dongwon.kim@intel.com>
      Fixes: a6a7cc4b ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dongwon Kim <dongwon.kim@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Tested-by: NDongwon Kim <dongwon.kim@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      e27ab73d
  15. 14 6月, 2017 3 次提交