1. 16 3月, 2016 2 次提交
  2. 04 3月, 2016 1 次提交
  3. 01 3月, 2016 1 次提交
    • T
      drm/i915: Execlists small cleanups and micro-optimisations · c6a2ac71
      Tvrtko Ursulin 提交于
      Assorted changes in the areas of code cleanup, reduction of
      invariant conditional in the interrupt handler and lock
      contention and MMIO access optimisation.
      
       * Remove needless initialization.
       * Improve cache locality by reorganizing code and/or using
         branch hints to keep unexpected or error conditions out
         of line.
       * Favor busy submit path vs. empty queue.
       * Less branching in hot-paths.
      
      v2:
      
       * Avoid mmio reads when possible. (Chris Wilson)
       * Use natural integer size for csb indices.
       * Remove useless return value from execlists_update_context.
       * Extract 32-bit ppgtt PDPs update so it is out of line and
         shared with two callers.
       * Grab forcewake across all mmio operations to ease the
         load on uncore lock and use chepear mmio ops.
      
      v3:
      
       * Removed some more pointless u8 data types.
       * Removed unused return from execlists_context_queue.
       * Commit message updates.
      
      v4:
       * Unclumsify the unqueue if statement. (Chris Wilson)
       * Hide forcewake from the queuing function. (Chris Wilson)
      
      Version 3 now makes the irq handling code path ~20% smaller on
      48-bit PPGTT hardware, and a little bit less elsewhere. Hot
      paths are mostly in-line now and hammering on the uncore
      spinlock is greatly reduced together with mmio traffic to an
      extent.
      
      Benchmarking with "gem_latency -n 100" (keep submitting
      batches with 100 nop instruction) shows approximately 4% higher
      throughput, 2% less CPU time and 22% smaller latencies. This was
      on a big-core while small-cores could benefit even more.
      
      Most likely reason for the improvements are the MMIO
      optimization and uncore lock traffic reduction.
      
      One odd result is with "gem_latency -n 0" (dispatching empty
      batches) which shows 5% more throughput, 8% less CPU time,
      25% better producer and consumer latencies, but 15% higher
      dispatch latency which is yet unexplained.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com
      c6a2ac71
  4. 25 1月, 2016 1 次提交
  5. 21 1月, 2016 4 次提交
  6. 20 1月, 2016 1 次提交
  7. 18 1月, 2016 2 次提交
  8. 08 1月, 2016 1 次提交
    • M
      drm/i915: Inspect subunit states on hangcheck · 61642ff0
      Mika Kuoppala 提交于
      If head seems stuck and engine in question is rcs,
      inspect subunit state transitions from undone to done,
      before deciding that this really is a hang instead of limited
      progress. Only account the transitions of subunits from
      undone to done once, to prevent unstable subunit states
      to keep us falsely active.
      
      As this adds one extra steps to hangcheck heuristics,
      before hang is declared, it adds 1500ms to to detect hang
      for render ring to a total of 7500ms. We could sample
      the subunit states on first head stuck condition but
      decide not to do so only in order to mimic old behaviour. This
      way the check order of promotion from seqno > atchd > instdone
      is consistently done.
      
      v2: Deal with unstable done states (Arun)
          Clear instdone progress on head and seqno movement (Chris)
          Report raw and accumulated instdone's in in debugfs (Chris)
          Return HANGCHECK_ACTIVE on undone->done
      
      References: https://bugs.freedesktop.org/show_bug.cgi?id=93029
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1448985372-19535-1-git-send-email-mika.kuoppala@intel.com
      61642ff0
  9. 10 12月, 2015 1 次提交
  10. 18 11月, 2015 2 次提交
  11. 29 10月, 2015 1 次提交
    • C
      drm/i915: Recover all available ringbuffer space following reset · 608c1a52
      Chris Wilson 提交于
      Having flushed all requests from all queues, we know that all
      ringbuffers must now be empty. However, since we do not reclaim
      all space when retiring the request (to prevent HEADs colliding
      with rapid ringbuffer wraparound) the amount of available space
      on each ringbuffer upon reset is less than when we start. Do one
      more pass over all the ringbuffers to reset the available space
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Cc: Dave Gordon <david.s.gordon@intel.com>
      608c1a52
  12. 04 9月, 2015 1 次提交
  13. 26 8月, 2015 1 次提交
    • I
      drm/i915/bxt: work around HW coherency issue when accessing GPU seqno · 319404df
      Imre Deak 提交于
      By running igt/store_dword_loop_render on BXT we can hit a coherency
      problem where the seqno written at GPU command completion time is not
      seen by the CPU. This results in __i915_wait_request seeing the stale
      seqno and not completing the request (not considering the lost
      interrupt/GPU reset mechanism). I also verified that this isn't a case
      of a lost interrupt, or that the command didn't complete somehow: when
      the coherency issue occured I read the seqno via an uncached GTT mapping
      too. While the cached version of the seqno still showed the stale value
      the one read via the uncached mapping was the correct one.
      
      Work around this issue by clflushing the corresponding CPU cacheline
      following any store of the seqno and preceding any reading of it. When
      reading it do this only when the caller expects a coherent view.
      
      v2:
      - fix using the proper logical && instead of a bitwise & (Jani, Mika)
      - limit the workaround to A stepping, on later steppings this HW issue
        is fixed
      v3:
      - use a separate get_seqno/set_seqno vfunc (Chris)
      
      Testcase: igt/store_dword_loop_render
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      319404df
  14. 14 7月, 2015 1 次提交
    • T
      drm/i915: Snapshot seqno of most recently submitted request. · 94f7bbe1
      Tomas Elf 提交于
      The hang checker needs to inspect whether or not the ring request list is empty
      as well as if the given engine has reached or passed the most recently
      submitted request. The problem with this is that the hang checker cannot grab
      the struct_mutex, which is required in order to safely inspect requests since
      requests might be deallocated during inspection. In the past we've had kernel
      panics due to this very unsynchronized access in the hang checker.
      
      One solution to this problem is to not inspect the requests directly since
      we're only interested in the seqno of the most recently submitted request - not
      the request itself. Instead the seqno of the most recently submitted request is
      stored separately, which the hang checker then inspects, circumventing the
      issue of synchronization from the hang checker entirely.
      
      This fixes a regression introduced in
      
      commit 44cdd6d2
      Author: John Harrison <John.C.Harrison@Intel.com>
      Date:   Mon Nov 24 18:49:40 2014 +0000
      
          drm/i915: Convert 'ring_idle()' to use requests not seqnos
      
      v2 (Chris Wilson):
      - Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather
      than compute it over again.
      - Remove extra whitespace.
      
      Issue: VIZ-5998
      Signed-off-by: NTomas Elf <tomas.elf@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Add regressing commit citation provided by Chris.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      94f7bbe1
  15. 06 7月, 2015 1 次提交
  16. 03 7月, 2015 1 次提交
    • J
      drm/i915: Reserve space improvements · 79bbcc29
      John Harrison 提交于
      An earlier patch was added to reserve space in the ring buffer for the
      commands issued during 'add_request()'. The initial version was
      pessimistic in the way it handled buffer wrapping and would cause
      premature wraps and thus waste ring space.
      
      This patch updates the code to better handle the wrap case. It no
      longer enforces that the space being asked for and the reserved space
      are a single contiguous block. Instead, it allows the reserve to be on
      the far end of a wrap operation. It still guarantees that the space is
      available so when the wrap occurs, no wait will happen. Thus the wrap
      cannot fail which is the whole point of the exercise.
      
      Also fixed a merge failure with some comments from the original patch.
      
      v2: Incorporated suggestion by David Gordon to move the wrap code
      inside the prepare function and thus allow a single combined
      wait_for_space() call rather than doing one before the wrap and
      another after. This also makes the prepare code much simpler and
      easier to follow.
      
      v3: Fix for 'effective_size' vs 'size' during ring buffer remainder
      calculations (spotted by Tomas Elf).
      
      For: VIZ-5115
      CC: Daniel Vetter <daniel@ffwll.ch>
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      79bbcc29
  17. 23 6月, 2015 18 次提交