1. 05 7月, 2012 1 次提交
    • D
      drm/i915: non-interruptible sleeps can't handle -EAGAIN · d6b2c790
      Daniel Vetter 提交于
      So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
      -EIO instead. Note that this isn't really an issue with
      interruptability, but more that we have quite a few codepaths (mostly
      around kms stuff) that simply can't handle any errors and hence not
      even -EAGAIN. Instead of adding proper failure paths so that we could
      restart these ioctls we've opted for the cheap way out of sleeping
      non-interruptibly.  Which works everywhere but when the gpu dies,
      which this patch fixes.
      
      So essentially interruptible == false means 'wait for the gpu or die
      trying'.'
      
      This patch is a bit ugly because intel_ring_begin is all non-interruptible
      and hence only returns -EIO. But as the comment in there says,
      auditing all the callsites would be a pain.
      
      To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
      and intel_wait_ring_buffer. Also use the opportunity to clarify the
      different cases in i915_gem_check_wedge a bit with comments.
      
      v2: Don't access dev_priv->mm.interruptible from check_wedge - we
      might not hold dev->struct_mutex, making this racy. Instead pass
      interruptible in as a parameter. I've noticed this because I've hit a
      BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
      added in
      
      commit b4aca010
      Author: Ben Widawsky <ben@bwidawsk.net>
      Date:   Wed Apr 25 20:50:12 2012 -0700
      
          drm/i915: extract some common olr+wedge code
      
      although that commit is missing any justification for this. I guess
      it's just copy&paste, because the same commit add the same BUG_ON
      check to check_olr, where it indeed makes sense.
      
      But in check_wedge everything we access is protected by other means,
      so this is superflous. And because it now gets in the way (we add a
      new caller in __wait_seqno, which can be called without
      dev->struct_mutext) let's just remove it.
      
      v3: Group all the i915_gem_check_wedge refactoring into this patch, so
      that this patch here is all about not returning -EAGAIN to callsites
      that can't handle syscall restarting.
      
      v4: Add clarification what interuptible == fales means in our code,
      requested by Ben Widawsky.
      
      v5: Fix EAGAIN mispell noticed by Chris Wilson.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d6b2c790
  2. 29 6月, 2012 1 次提交
  3. 14 6月, 2012 2 次提交
    • B
      drm/i915: possibly invalidate TLB before context switch · 12b0286f
      Ben Widawsky 提交于
      From http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol1_Part3.pdf
      
      [DevSNB] If Flush TLB invalidation Mode is enabled it's the driver's
      responsibility to invalidate the TLBs at least once after the previous
      context switch after any GTT mappings changed (including new GTT
      entries).  This can be done by a pipelined PIPE_CONTROL with TLB inv bit
      set immediately before MI_SET_CONTEXT.
      
      On GEN7 the invalidation mode is explicitly set, but this appears to be
      lacking for GEN6. Since I don't know the history on this, I've decided
      to dynamically read the value at ring init time, and use that value
      throughout.
      
      v2: better comment (daniel)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      12b0286f
    • B
      drm/i915: PIPE_CONTROL_TLB_INVALIDATE · cc0f6398
      Ben Widawsky 提交于
      This has showed up in several other patches. It's required for the next
      context workaround.
      
      I tested this one on its own and saw no differences in basic tests
      (performance or otherwise). This patch is relatively likely to cause
      regressions, hence why it's split out.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      cc0f6398
  4. 13 6月, 2012 1 次提交
  5. 05 6月, 2012 2 次提交
  6. 31 5月, 2012 1 次提交
  7. 30 5月, 2012 1 次提交
    • C
      drm/i915: Reset last_retired_head when resetting ring · c3b20037
      Chris Wilson 提交于
      When we reset the ring control registers, including the HEAD and TAIL of
      the ring, we also need to reset associated state. In this instance, we
      were failing to reset the cached value of ring->last_retired_head and so
      upon the first request for more space following a resume would
      potentially (depending on a narrow race window) believe that the HEAD had
      advanced much further than reality.
      
      This is a regression from:
      
      commit a71d8d94
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed Feb 15 11:25:36 2012 +0000
      
          drm/i915: Record the tail at each request and use it to estimate the head
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org # 3.4
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c3b20037
  8. 25 5月, 2012 1 次提交
  9. 07 5月, 2012 1 次提交
  10. 06 5月, 2012 1 次提交
    • D
      drm/i915: add interface to simulate gpu hangs · e5eb3d63
      Daniel Vetter 提交于
      gpu reset is a very important piece of our infrastructure.
      Unfortunately we only really it test by actually hanging the gpu,
      which often has bad side-effects for the entire system. And the gpu
      hang handling code is one of the rather complicated pieces of code we
      have, consisting of
      - hang detection
      - error capture
      - actual gpu reset
      - reset of all the gem bookkeeping
      - reinitialition of the entire gpu
      
      This patch adds a debugfs to selectively stopping rings by ceasing to
      update the hw tail pointer, which will result in the gpu no longer
      updating it's head pointer and eventually to the hangcheck firing.
      This way we can exercise the gpu hang code under controlled conditions
      without a dying gpu taking down the entire systems.
      
      Patch motivated by me forgetting to properly reinitialize ppgtt after
      a gpu reset.
      
      Usage:
      
      echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
      
      echo 0xffffffff > i915_ring_stop # stops all, future-proof version
      
      then run whatever testload is desired. i915_ring_stop automatically
      resets after a gpu hang is detected to avoid hanging the gpu to fast
      and declaring it wedged.
      
      v2: Incorporate feedback from Chris Wilson.
      
      v3: Add the missing cleanup.
      
      v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by
      Eugeni Dodonov.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e5eb3d63
  11. 03 5月, 2012 7 次提交
  12. 28 4月, 2012 1 次提交
  13. 20 4月, 2012 1 次提交
  14. 18 4月, 2012 3 次提交
  15. 13 4月, 2012 15 次提交
  16. 11 4月, 2012 1 次提交