1. 16 3月, 2016 2 次提交
  2. 04 3月, 2016 1 次提交
  3. 01 3月, 2016 1 次提交
    • T
      drm/i915: Execlists small cleanups and micro-optimisations · c6a2ac71
      Tvrtko Ursulin 提交于
      Assorted changes in the areas of code cleanup, reduction of
      invariant conditional in the interrupt handler and lock
      contention and MMIO access optimisation.
      
       * Remove needless initialization.
       * Improve cache locality by reorganizing code and/or using
         branch hints to keep unexpected or error conditions out
         of line.
       * Favor busy submit path vs. empty queue.
       * Less branching in hot-paths.
      
      v2:
      
       * Avoid mmio reads when possible. (Chris Wilson)
       * Use natural integer size for csb indices.
       * Remove useless return value from execlists_update_context.
       * Extract 32-bit ppgtt PDPs update so it is out of line and
         shared with two callers.
       * Grab forcewake across all mmio operations to ease the
         load on uncore lock and use chepear mmio ops.
      
      v3:
      
       * Removed some more pointless u8 data types.
       * Removed unused return from execlists_context_queue.
       * Commit message updates.
      
      v4:
       * Unclumsify the unqueue if statement. (Chris Wilson)
       * Hide forcewake from the queuing function. (Chris Wilson)
      
      Version 3 now makes the irq handling code path ~20% smaller on
      48-bit PPGTT hardware, and a little bit less elsewhere. Hot
      paths are mostly in-line now and hammering on the uncore
      spinlock is greatly reduced together with mmio traffic to an
      extent.
      
      Benchmarking with "gem_latency -n 100" (keep submitting
      batches with 100 nop instruction) shows approximately 4% higher
      throughput, 2% less CPU time and 22% smaller latencies. This was
      on a big-core while small-cores could benefit even more.
      
      Most likely reason for the improvements are the MMIO
      optimization and uncore lock traffic reduction.
      
      One odd result is with "gem_latency -n 0" (dispatching empty
      batches) which shows 5% more throughput, 8% less CPU time,
      25% better producer and consumer latencies, but 15% higher
      dispatch latency which is yet unexplained.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com
      c6a2ac71
  4. 26 2月, 2016 3 次提交
  5. 29 1月, 2016 4 次提交
  6. 25 1月, 2016 2 次提交
  7. 21 1月, 2016 5 次提交
  8. 18 1月, 2016 3 次提交
  9. 15 1月, 2016 1 次提交
  10. 13 1月, 2016 3 次提交
  11. 12 1月, 2016 1 次提交
  12. 09 1月, 2016 1 次提交
  13. 07 1月, 2016 3 次提交
  14. 05 1月, 2016 3 次提交
  15. 30 12月, 2015 1 次提交
  16. 21 12月, 2015 1 次提交
  17. 12 12月, 2015 1 次提交
    • D
      drm/i915: mark GEM object pages dirty when mapped & written by the CPU · 033908ae
      Dave Gordon 提交于
      In various places, a single page of a (regular) GEM object is mapped into
      CPU address space and updated. In each such case, either the page or the
      the object should be marked dirty, to ensure that the modifications are
      not discarded if the object is evicted under memory pressure.
      
      The typical sequence is:
      	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
      	*(va+offset) = ...
      	kunmap_atomic(va);
      
      Here we introduce i915_gem_object_get_dirty_page(), which performs the
      same operation as i915_gem_object_get_page() but with the side-effect
      of marking the returned page dirty in the pagecache.  This will ensure
      that if the object is subsequently evicted (due to memory pressure),
      the changes are written to backing store rather than discarded.
      
      Note that it works only for regular (shmfs-backed) GEM objects, but (at
      least for now) those are the only ones that are updated in this way --
      the objects in question are contexts and batchbuffers, which are always
      shmfs-backed.
      
      Separate patches deal with the cases where whole objects are (or may
      be) dirtied.
      
      v3: Mark two more pages dirty in the page-boundary-crossing
          cases of the execbuffer relocation code [Chris Wilson]
      Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-2-git-send-email-david.s.gordon@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      033908ae
  18. 10 12月, 2015 1 次提交
  19. 05 12月, 2015 1 次提交
  20. 03 12月, 2015 1 次提交
    • N
      drm/i915: Extend LRC pinning to cover GPU context writeback · 6d65ba94
      Nick Hoath 提交于
      Use the first retired request on a new context to unpin
      the old context. This ensures that the hw context remains
      bound until it has been written back to by the GPU.
      Now that the context is pinned until later in the request/context
      lifecycle, it no longer needs to be pinned from context_queue to
      retire_requests.
      This fixes an issue with GuC submission where the GPU might not
      have finished writing back the context before it is unpinned. This
      results in a GPU hang.
      
      v2: Moved the new pin to cover GuC submission (Alex Dai)
          Moved the new unpin to request_retire to fix coverage leak
      v3: Added switch to default context if freeing a still pinned
          context just in case the hw was actually still using it
      v4: Unwrapped context unpin to allow calling without a request
      v5: Only create a switch to idle context if the ring doesn't
          already have a request pending on it (Alex Dai)
          Rename unsaved to dirty to avoid double negatives (Dave Gordon)
          Changed _no_req postfix to __ prefix for consistency (Dave Gordon)
          Split out per engine cleanup from context_free as it
          was getting unwieldy
          Corrected locking (Dave Gordon)
      v6: Removed some bikeshedding (Mika Kuoppala)
          Added explanation of the GuC hang that this fixes (Daniel Vetter)
      v7: Removed extra per request pinning from ring reset code (Alex Dai)
          Added forced ring unpin/clean in error case in context free (Alex Dai)
      Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
      Issue: VIZ-4277
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Gordon <david.s.gordon@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Alex Dai <yu.dai@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NAlex Dai <yu.dai@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6d65ba94
  21. 20 11月, 2015 1 次提交