1. 12 8月, 2017 2 次提交
  2. 04 8月, 2017 3 次提交
  3. 31 7月, 2017 1 次提交
  4. 27 7月, 2017 7 次提交
  5. 21 7月, 2017 1 次提交
  6. 20 7月, 2017 1 次提交
  7. 19 7月, 2017 1 次提交
  8. 17 7月, 2017 1 次提交
  9. 13 7月, 2017 4 次提交
  10. 06 7月, 2017 1 次提交
  11. 04 7月, 2017 1 次提交
  12. 03 7月, 2017 1 次提交
  13. 28 6月, 2017 1 次提交
    • C
      drm/i915: Avoid keeping waitboost active for signaling threads · 7b92c1bd
      Chris Wilson 提交于
      Once a client has requested a waitboost, we keep that waitboost active
      until all clients are no longer waiting. This is because we don't
      distinguish which waiter deserves the boost. However, with the advent of
      fence signaling, the signaler threads appear as waiters to the RPS
      interrupt handler. So instead of using a single boolean to track when to
      keep the waitboost active, use a counter of all outstanding waitboosted
      requests.
      
      At this point, I have removed all vestiges of the rate limiting on
      clients. Whilst this means that compositors should remain more fluid,
      it also means that boosts are more prevalent. See commit b29c19b6
      ("drm/i915: Boost RPS frequency for CPU stalls") for a longer discussion
      on the pros and cons of both approaches.
      
      A drawback of this implementation is that it requires constant request
      submission to keep the waitboost trimmed (as it is now cancelled when the
      request is completed). This will be fine for a busy system, but near
      idle the boosts may be kept for longer than desired (effectively tens of
      vblanks worstcase) and there is a reliance on rc6 instead.
      
      v2: Remove defunct rps.client_lock
      Reported-by: NMichał Winiarski <michal.winiarski@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/20170628123548.9236-1-chris@chris-wilson.co.uk
      7b92c1bd
  14. 23 6月, 2017 2 次提交
  15. 21 6月, 2017 6 次提交
  16. 19 6月, 2017 1 次提交
  17. 16 6月, 2017 3 次提交
    • C
      drm/i915: Wait upon userptr get-user-pages within execbuffer · 8a2421bd
      Chris Wilson 提交于
      This simply hides the EAGAIN caused by userptr when userspace causes
      resource contention. However, it is quite beneficial with highly
      contended userptr users as we avoid repeating the setup costs and
      kernel-user context switches.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
      8a2421bd
    • C
      drm/i915: Eliminate lots of iterations over the execobjects array · 2889caa9
      Chris Wilson 提交于
      The major scaling bottleneck in execbuffer is the processing of the
      execobjects. Creating an auxiliary list is inefficient when compared to
      using the execobject array we already have allocated.
      
      Reservation is then split into phases. As we lookup up the VMA, we
      try and bind it back into active location. Only if that fails, do we add
      it to the unbound list for phase 2. In phase 2, we try and add all those
      objects that could not fit into their previous location, with fallback
      to retrying all objects and evicting the VM in case of severe
      fragmentation. (This is the same as before, except that phase 1 is now
      done inline with looking up the VMA to avoid an iteration over the
      execobject array. In the ideal case, we eliminate the separate reservation
      phase). During the reservation phase, we only evict from the VM between
      passes (rather than currently as we try to fit every new VMA). In
      testing with Unreal Engine's Atlantis demo which stresses the eviction
      logic on gen7 class hardware, this speed up the framerate by a factor of
      2.
      
      The second loop amalgamation is between move_to_gpu and move_to_active.
      As we always submit the request, even if incomplete, we can use the
      current request to track active VMA as we perform the flushes and
      synchronisation required.
      
      The next big advancement is to avoid copying back to the user any
      execobjects and relocations that are not changed.
      
      v2: Add a Theory of Operation spiel.
      v3: Fall back to slow relocations in preparation for flushing userptrs.
      v4: Document struct members, factor out eb_validate_vma(), add a few
      more comments to explain some magic and hide other magic behind macros.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      2889caa9
    • C
      drm/i915: Store a direct lookup from object handle to vma · 4ff4b44c
      Chris Wilson 提交于
      The advent of full-ppgtt lead to an extra indirection between the object
      and its binding. That extra indirection has a noticeable impact on how
      fast we can convert from the user handles to our internal vma for
      execbuffer. In order to bypass the extra indirection, we use a
      resizable hashtable to jump from the object to the per-ctx vma.
      rhashtable was considered but we don't need the online resizing feature
      and the extra complexity proved to undermine its usefulness. Instead, we
      simply reallocate the hastable on demand in a background task and
      serialize it before iterating.
      
      In non-full-ppgtt modes, multiple files and multiple contexts can share
      the same vma. This leads to having multiple possible handle->vma links,
      so we only use the first to establish the fast path. The majority of
      buffers are not shared and so we should still be able to realise
      speedups with multiple clients.
      
      v2: Prettier names, more magic.
      v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      4ff4b44c
  18. 15 6月, 2017 3 次提交