1. 16 3月, 2016 5 次提交
  2. 04 3月, 2016 1 次提交
  3. 01 3月, 2016 1 次提交
    • T
      drm/i915: Execlists small cleanups and micro-optimisations · c6a2ac71
      Tvrtko Ursulin 提交于
      Assorted changes in the areas of code cleanup, reduction of
      invariant conditional in the interrupt handler and lock
      contention and MMIO access optimisation.
      
       * Remove needless initialization.
       * Improve cache locality by reorganizing code and/or using
         branch hints to keep unexpected or error conditions out
         of line.
       * Favor busy submit path vs. empty queue.
       * Less branching in hot-paths.
      
      v2:
      
       * Avoid mmio reads when possible. (Chris Wilson)
       * Use natural integer size for csb indices.
       * Remove useless return value from execlists_update_context.
       * Extract 32-bit ppgtt PDPs update so it is out of line and
         shared with two callers.
       * Grab forcewake across all mmio operations to ease the
         load on uncore lock and use chepear mmio ops.
      
      v3:
      
       * Removed some more pointless u8 data types.
       * Removed unused return from execlists_context_queue.
       * Commit message updates.
      
      v4:
       * Unclumsify the unqueue if statement. (Chris Wilson)
       * Hide forcewake from the queuing function. (Chris Wilson)
      
      Version 3 now makes the irq handling code path ~20% smaller on
      48-bit PPGTT hardware, and a little bit less elsewhere. Hot
      paths are mostly in-line now and hammering on the uncore
      spinlock is greatly reduced together with mmio traffic to an
      extent.
      
      Benchmarking with "gem_latency -n 100" (keep submitting
      batches with 100 nop instruction) shows approximately 4% higher
      throughput, 2% less CPU time and 22% smaller latencies. This was
      on a big-core while small-cores could benefit even more.
      
      Most likely reason for the improvements are the MMIO
      optimization and uncore lock traffic reduction.
      
      One odd result is with "gem_latency -n 0" (dispatching empty
      batches) which shows 5% more throughput, 8% less CPU time,
      25% better producer and consumer latencies, but 15% higher
      dispatch latency which is yet unexplained.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com
      c6a2ac71
  4. 26 2月, 2016 3 次提交
  5. 29 1月, 2016 4 次提交
  6. 25 1月, 2016 2 次提交
  7. 21 1月, 2016 5 次提交
  8. 18 1月, 2016 3 次提交
  9. 15 1月, 2016 1 次提交
  10. 13 1月, 2016 3 次提交
  11. 12 1月, 2016 1 次提交
  12. 09 1月, 2016 1 次提交
  13. 07 1月, 2016 3 次提交
  14. 05 1月, 2016 3 次提交
  15. 30 12月, 2015 1 次提交
  16. 21 12月, 2015 1 次提交
  17. 12 12月, 2015 1 次提交
    • D
      drm/i915: mark GEM object pages dirty when mapped & written by the CPU · 033908ae
      Dave Gordon 提交于
      In various places, a single page of a (regular) GEM object is mapped into
      CPU address space and updated. In each such case, either the page or the
      the object should be marked dirty, to ensure that the modifications are
      not discarded if the object is evicted under memory pressure.
      
      The typical sequence is:
      	va = kmap_atomic(i915_gem_object_get_page(obj, pageno));
      	*(va+offset) = ...
      	kunmap_atomic(va);
      
      Here we introduce i915_gem_object_get_dirty_page(), which performs the
      same operation as i915_gem_object_get_page() but with the side-effect
      of marking the returned page dirty in the pagecache.  This will ensure
      that if the object is subsequently evicted (due to memory pressure),
      the changes are written to backing store rather than discarded.
      
      Note that it works only for regular (shmfs-backed) GEM objects, but (at
      least for now) those are the only ones that are updated in this way --
      the objects in question are contexts and batchbuffers, which are always
      shmfs-backed.
      
      Separate patches deal with the cases where whole objects are (or may
      be) dirtied.
      
      v3: Mark two more pages dirty in the page-boundary-crossing
          cases of the execbuffer relocation code [Chris Wilson]
      Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1449773486-30822-2-git-send-email-david.s.gordon@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      033908ae
  18. 10 12月, 2015 1 次提交