1. 16 6月, 2017 10 次提交
    • C
      drm/i915: Allow execbuffer to use the first object as the batch · 1a71cf2f
      Chris Wilson 提交于
      Currently, the last object in the execlist is the always the batch.
      However, when building the batch buffer we often know the batch object
      first and if we can use the first slot in the execlist we can emit
      relocation instructions relative to it immediately and avoid a separate
      pass to adjust the relocations to point to the last execlist slot.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      1a71cf2f
    • C
      drm/i915: Wait upon userptr get-user-pages within execbuffer · 8a2421bd
      Chris Wilson 提交于
      This simply hides the EAGAIN caused by userptr when userspace causes
      resource contention. However, it is quite beneficial with highly
      contended userptr users as we avoid repeating the setup costs and
      kernel-user context switches.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      8a2421bd
    • C
      drm/i915: First try the previous execbuffer location · 616d9cee
      Chris Wilson 提交于
      When choosing a slot for an execbuffer, we ideally want to use the same
      address as last time (so that we don't have to rebind it) and the same
      address as expected by the user (so that we don't have to fixup any
      relocations pointing to it). If we first try to bind the incoming
      execbuffer->offset from the user, or the currently bound offset that
      should hopefully achieve the goal of avoiding the rebind cost and the
      relocation penalty. However, if the object is not currently bound there
      we don't want to arbitrarily unbind an object in our chosen position and
      so choose to rebind/relocate the incoming object instead. After we
      report the new position back to the user, on the next pass the
      relocations should have settled down.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtien@linux.intel.com>
      616d9cee
    • C
      drm/i915: Store a persistent reference for an object in the execbuffer cache · dade2a61
      Chris Wilson 提交于
      If we take a reference to the object/vma when it is first used in an
      execbuf, we can keep that reference until the object's file-local handle
      is closed. Thereby saving a frequent ref/unref pair.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      dade2a61
    • C
      drm/i915: Eliminate lots of iterations over the execobjects array · 2889caa9
      Chris Wilson 提交于
      The major scaling bottleneck in execbuffer is the processing of the
      execobjects. Creating an auxiliary list is inefficient when compared to
      using the execobject array we already have allocated.
      
      Reservation is then split into phases. As we lookup up the VMA, we
      try and bind it back into active location. Only if that fails, do we add
      it to the unbound list for phase 2. In phase 2, we try and add all those
      objects that could not fit into their previous location, with fallback
      to retrying all objects and evicting the VM in case of severe
      fragmentation. (This is the same as before, except that phase 1 is now
      done inline with looking up the VMA to avoid an iteration over the
      execobject array. In the ideal case, we eliminate the separate reservation
      phase). During the reservation phase, we only evict from the VM between
      passes (rather than currently as we try to fit every new VMA). In
      testing with Unreal Engine's Atlantis demo which stresses the eviction
      logic on gen7 class hardware, this speed up the framerate by a factor of
      2.
      
      The second loop amalgamation is between move_to_gpu and move_to_active.
      As we always submit the request, even if incomplete, we can use the
      current request to track active VMA as we perform the flushes and
      synchronisation required.
      
      The next big advancement is to avoid copying back to the user any
      execobjects and relocations that are not changed.
      
      v2: Add a Theory of Operation spiel.
      v3: Fall back to slow relocations in preparation for flushing userptrs.
      v4: Document struct members, factor out eb_validate_vma(), add a few
      more comments to explain some magic and hide other magic behind macros.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      2889caa9
    • C
      drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations · 071750e5
      Chris Wilson 提交于
      If we write a relocation into the buffer, we require our own implicit
      synchronisation added after the start of the execbuf, outside of the
      user's control. As we may end up clflushing, or doing the patch itself
      on the GPU, asynchronously we need to look at the implicit serialisation
      on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
      object.
      
      If the user does trigger a stall for relocations, we make sure the stall
      is complete enough so that the batch is not submitted before we complete
      those relocations.
      
      Fixes: 77ae9957 ("drm/i915: Enable userspace to opt-out of implicit fencing")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      071750e5
    • C
      drm/i915: Pass vma to relocate entry · 507d977f
      Chris Wilson 提交于
      We can simplify our tracking of pending writes in an execbuf to the
      single bit in the vma->exec_entry->flags, but that requires the
      relocation function knowing the object's vma. Pass it along.
      
      Note we have only been using a single bit to track flushing since
      
      commit cc889e0f
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Wed Jun 13 20:45:19 2012 +0200
      
          drm/i915: disable flushing_list/gpu_write_list
      
      unconditionally flushed all render caches before the breadcrumb and
      
      commit 6ac42f41
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Sat Jul 21 12:25:01 2012 +0200
      
          drm/i915: Replace the complex flushing logic with simple invalidate/flush all
      
      did away with the explicit GPU domain tracking. This was then codified
      into the ABI with NO_RELOC in
      
      commit ed5982e6
      Author: Daniel Vetter <daniel.vetter@ffwll.ch> # Oi! Patch stealer!
      Date:   Thu Jan 17 22:23:36 2013 +0100
      
          drm/i915: Allow userspace to hint that the relocations were known
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      507d977f
    • C
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson 提交于
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c
    • C
      drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty · 7fc92e96
      Chris Wilson 提交于
      For ease of use (i.e. avoiding a few checks and function calls), store
      the object's cache coherency next to the cache is dirty bit.
      
      Specifically this patch aims to reduce the frequency of no-op calls to
      i915_gem_object_clflush() to counter-act the increase of such calls for
      GPU only objects in the previous patch.
      
      v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
      !cache_coherent as gcc generates much better code for the latter
      (Tvrtko)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dongwon Kim <dongwon.kim@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Tested-by: NDongwon Kim <dongwon.kim@intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      7fc92e96
    • C
      drm/i915: Mark CPU cache as dirty on every transition for CPU writes · e27ab73d
      Chris Wilson 提交于
      Currently, we only mark the CPU cache as dirty if we skip a clflush.
      This leads to some confusion where we have to ask if the object is in
      the write domain or missed a clflush. If we always mark the cache as
      dirty, this becomes a much simply question to answer.
      
      The goal remains to do as few clflushes as required and to do them as
      late as possible, in the hope of deferring the work to a kthread and not
      block the caller (e.g. execbuf, flips).
      
      v2: Always call clflush before GPU execution when the cache_dirty flag
      is set. This may cause some extra work on llc systems that migrate dirty
      buffers back and forth - but we do try to limit that by only setting
      cache_dirty at the end of the gpu sequence.
      
      v3: Always mark the cache as dirty upon a level change, as we need to
      invalidate any stale cachelines due to external writes.
      Reported-by: NDongwon Kim <dongwon.kim@intel.com>
      Fixes: a6a7cc4b ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dongwon Kim <dongwon.kim@intel.com>
      Cc: Matt Roper <matthew.d.roper@intel.com>
      Tested-by: NDongwon Kim <dongwon.kim@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      e27ab73d
  2. 15 6月, 2017 3 次提交
  3. 18 5月, 2017 1 次提交
  4. 15 4月, 2017 1 次提交
  5. 29 3月, 2017 1 次提交
  6. 27 3月, 2017 1 次提交
  7. 14 3月, 2017 1 次提交
  8. 03 3月, 2017 1 次提交
  9. 25 2月, 2017 1 次提交
    • K
      drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters. · ef0f411f
      Kenneth Graunke 提交于
      This patch makes the I915_PARAM_HAS_EXEC_CONSTANTS getparam return 0
      (indicating the optional feature is not supported), and makes execbuf
      always return -EINVAL if the flags are used.
      
      Apparently, no userspace ever shipped which used this optional feature:
      I checked the git history of Mesa, xf86-video-intel, libva, and Beignet,
      and there were zero commits showing a use of these flags.  Kernel commit
      72bfa19c apparently introduced the feature prematurely.  According
      to Chris, the intention was to use this in cairo-drm, but "the use was
      broken for gen6", so I don't think it ever happened.
      
      'relative_constants_mode' has always been tracked per-device, but this
      has actually been wrong ever since hardware contexts were introduced, as
      the INSTPM register is saved (and automatically restored) as part of the
      render ring context. The software per-device value could therefore get
      out of sync with the hardware per-context value.  This meant that using
      them is actually unsafe: a client which tried to use them could damage
      the state of other clients, causing the GPU to interpret their BO
      offsets as absolute pointers, leading to bogus memory reads.
      
      These flags were also never ported to execlist mode, making them no-ops
      on Gen9+ (which requires execlists), and Gen8 in the default mode.
      
      On Gen8+, userspace can write these registers directly, achieving the
      same effect.  On Gen6-7.5, it likely makes sense to extend the command
      parser to support them.  I don't think anyone wants this on Gen4-5.
      
      Based on a patch by Dave Gordon.
      
      v3: Return -ENODEV for the getparam, as this is what we do for other
          obsolete features.  Suggested by Chris Wilson.
      
      Cc: stable@vger.kernel.org
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92448Signed-off-by: NKenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170215093446.21291-1-kenneth@whitecape.orgAcked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      ef0f411f
  10. 22 2月, 2017 2 次提交
  11. 21 2月, 2017 2 次提交
  12. 14 2月, 2017 1 次提交
    • T
      drm/i915: Emit to ringbuffer directly · 73dec95e
      Tvrtko Ursulin 提交于
      This removes the usage of intel_ring_emit in favour of
      directly writing to the ring buffer.
      
      intel_ring_emit was preventing the compiler for optimising
      fetch and increment of the current ring buffer pointer and
      therefore generating very verbose code for every write.
      
      It had no useful purpose since all ringbuffer operations
      are started and ended with intel_ring_begin and
      intel_ring_advance respectively, with no bail out in the
      middle possible, so it is fine to increment the tail in
      intel_ring_begin and let the code manage the pointer
      itself.
      
      Useless instruction removal amounts to approximately
      two and half kilobytes of saved text on my build.
      
      Not sure if this has any measurable performance
      implications but executing a ton of useless instructions
      on fast paths cannot be good.
      
      v2:
       * Change return from intel_ring_begin to error pointer by
         popular demand.
       * Move tail increment to intel_ring_advance to enable some
         error checking.
      
      v3:
       * Move tail advance back into intel_ring_begin.
       * Rebase and tidy.
      
      v4:
       * Complete rebase after a few months since v3.
      
      v5:
       * Remove unecessary cast and fix !debug compile. (Chris Wilson)
      
      v6:
       * Make intel_ring_offset take request as well.
       * Fix recording of request postfix plus a sprinkle of asserts.
         (Chris Wilson)
      
      v7:
       * Use intel_ring_offset to get the postfix. (Chris Wilson)
       * Convert GVT code as well.
      
      v8:
       * Rename *out++ to *cs++.
      
      v9:
       * Fix GVT out to cs conversion in GVT.
      
      v10:
       * Rebase for new intel_ring_begin in selftests.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Zhi Wang <zhi.a.wang@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170214113242.29241-1-tvrtko.ursulin@linux.intel.com
      73dec95e
  13. 08 2月, 2017 2 次提交
  14. 04 2月, 2017 1 次提交
  15. 03 2月, 2017 1 次提交
    • C
      drm: Improve drm_mm search (and fix topdown allocation) with rbtrees · 4e64e553
      Chris Wilson 提交于
      The drm_mm range manager claimed to support top-down insertion, but it
      was neither searching for the top-most hole that could fit the
      allocation request nor fitting the request to the hole correctly.
      
      In order to search the range efficiently, we create a secondary index
      for the holes using either their size or their address. This index
      allows us to find the smallest hole or the hole at the bottom or top of
      the range efficiently, whilst keeping the hole stack to rapidly service
      evictions.
      
      v2: Search for holes both high and low. Rename flags to mode.
      v3: Discover rb_entry_safe() and use it!
      v4: Kerneldoc for enum drm_mm_insert_mode.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: Lucas Stach <l.stach@pengutronix.de>
      Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Alexandre Courbot <gnurou@gmail.com>
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Sinclair Yeh <syeh@vmware.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: Sinclair Yeh <syeh@vmware.com> # vmwgfx
      Reviewed-by: Lucas Stach <l.stach@pengutronix.de> #etnaviv
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170202210438.28702-1-chris@chris-wilson.co.uk
      4e64e553
  16. 28 1月, 2017 2 次提交
  17. 19 1月, 2017 1 次提交
  18. 11 1月, 2017 1 次提交
  19. 31 12月, 2016 1 次提交
  20. 19 12月, 2016 1 次提交
  21. 06 12月, 2016 2 次提交
    • C
      drm/i915: Fix i915_gem_evict_for_vma (soft-pinning) · 172ae5b4
      Chris Wilson 提交于
      Soft-pinning depends upon being able to check for availabilty of an
      interval and evict overlapping object from a drm_mm range manager very
      quickly. Currently it uses a linear list, and so performance is dire and
      not suitable as a general replacement. Worse, the current code will oops
      if it tries to evict an active buffer.
      
      It also helps if the routine reports the correct error codes as expected
      by its callers and emits a tracepoint upon use.
      
      For posterity since the wrong patch was pushed (i.e. that missed these
      key points and had known bugs), this is the changelog that should have
      been on commit 506a8e87 ("drm/i915: Add soft-pinning API for
      execbuffer"):
      
      Userspace can pass in an offset that it presumes the object is located
      at. The kernel will then do its utmost to fit the object into that
      location. The assumption is that userspace is handling its own object
      locations (for example along with full-ppgtt) and that the kernel will
      rarely have to make space for the user's requests.
      
      This extends the DRM_IOCTL_I915_GEM_EXECBUFFER2 to do the following:
      * if the user supplies a virtual address via the execobject->offset
        *and* sets the EXEC_OBJECT_PINNED flag in execobject->flags, then
        that object is placed at that offset in the address space selected
        by the context specifier in execbuffer.
      * the location must be aligned to the GTT page size, 4096 bytes
      * as the object is placed exactly as specified, it may be used by this
        execbuffer call without relocations pointing to it
      
      It may fail to do so if:
      * EINVAL is returned if the object does not have a 4096 byte aligned
        address
      * the object conflicts with another pinned object (either pinned by
        hardware in that address space, e.g. scanouts in the aliasing ppgtt)
        or within the same batch.
        EBUSY is returned if the location is pinned by hardware
        EINVAL is returned if the location is already in use by the batch
      * EINVAL is returned if the object conflicts with its own alignment (as meets
        the hardware requirements) or if the placement of the object does not fit
        within the address space
      
      All other execbuffer errors apply.
      
      Presence of this execbuf extension may be queried by passing
      I915_PARAM_HAS_EXEC_SOFTPIN to DRM_IOCTL_I915_GETPARAM and checking for
      a reported value of 1 (or greater).
      
      v2: Combine the hole/adjusted-hole ENOSPC checks
      v3: More color, more splitting, more blurb.
      
      Fixes: 506a8e87 ("drm/i915: Add soft-pinning API for execbuffer")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-2-chris@chris-wilson.co.uk
      172ae5b4
    • C
      drm/i915: Mark all non-vma being inserted into the address spaces · 85fd4f58
      Chris Wilson 提交于
      We need to distinguish between full i915_vma structs and simple
      drm_mm_nodes when considering eviction (i.e. we must be careful not to
      treat a mere drm_mm_node as a much larger i915_vma causing memory
      corruption, if we are lucky). To do this, color these not-a-vma with -1
      (I915_COLOR_UNEVICTABLE).
      
      v2...v200: New name for -1.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161205142941.21965-1-chris@chris-wilson.co.uk
      85fd4f58
  22. 24 11月, 2016 1 次提交
  23. 21 11月, 2016 1 次提交
  24. 18 11月, 2016 1 次提交