1. 29 3月, 2012 3 次提交
  2. 28 3月, 2012 3 次提交
  3. 27 3月, 2012 3 次提交
  4. 26 3月, 2012 1 次提交
  5. 24 3月, 2012 1 次提交
  6. 23 3月, 2012 2 次提交
  7. 21 3月, 2012 2 次提交
    • D
      drm/i915: bind objects to the global gtt only when needed · 74898d7e
      Daniel Vetter 提交于
      And track the existence of such a binding similar to the aliasing
      ppgtt case. Speeds up binding/unbinding in the common case where we
      only need a ppgtt binding (which is accessed in a cpu coherent fashion
      by the gpu) and no gloabl gtt binding (which needs uc writes for the
      ptes).
      
      This patch just puts the required tracking in place.
      
      v2: Check that global gtt mappings exist in the error_state capture
      code (with Chris Wilson's llc reloc patches batchbuffers are no longer
      relocated as mappable in all situations, so this matters). Suggested
      by Chris Wilson.
      
      v3: Adapted to Chris' latest llc-reloc patches.
      
      v4: Fix a bug in the i915 error state capture code noticed by Chris
      Wilson.
      Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      74898d7e
    • D
      drm/i915: split out dma mapping from global gtt bind/unbind functions · 74163907
      Daniel Vetter 提交于
      Note that there's a functional change buried in this patch wrt the ilk
      dmar workaround: We now only idle the gpu while tearing down the dmar
      mappings, not while clearing the gtt. Keeping the current semantics
      would have made for some really ugly code and afaik the issue is only
      with the dmar unmapping that needs a fully idle gpu.
      Reviewed-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      74163907
  8. 19 3月, 2012 1 次提交
  9. 01 3月, 2012 4 次提交
  10. 28 2月, 2012 1 次提交
  11. 15 2月, 2012 3 次提交
    • C
      drm/i915: Record the position of the request upon error · ee4f42b1
      Chris Wilson 提交于
      So that we can tally the request against the command sequence in the
      ringbuffer, or merely jump to the interesting locations.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ee4f42b1
    • C
      drm/i915: Record the in-flight requests at the time of a hang · 52d39a21
      Chris Wilson 提交于
      Being able to tally the list of outstanding requests with the sequence
      of commands in the ringbuffer is often useful evidence with respect to
      driver corruption.
      
      Note that since this is the umpteenth per-ring data structure to be added
      to the error state, I've coallesced the nearby loops (the ringbuffer and
      batchbuffer) into a single structure along with the list of requests.  A
      later task would be to refactor the ring register state into the same
      structure.
      
      v2: Fix pretty printing of requests so that they are parsed correctly by
      intel_error_decode and use the 0x%08x format for seqno for consistency
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      52d39a21
    • C
      drm/i915: Record the tail at each request and use it to estimate the head · a71d8d94
      Chris Wilson 提交于
      By recording the location of every request in the ringbuffer, we know
      that in order to retire the request the GPU must have finished reading
      it and so the GPU head is now beyond the tail of the request. We can
      therefore provide a conservative estimate of where the GPU is reading
      from in order to avoid having to read back the ring buffer registers
      when polling for space upon starting a new write into the ringbuffer.
      
      A secondary effect is that this allows us to convert
      intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
      upon the single function to handle the complicated task of waiting upon
      the GPU. A necessary precaution is that we need to make that wait
      uninterruptible to match the existing conditions as all the callers of
      intel_ring_begin() have not been audited to handle ERESTARTSYS
      correctly.
      
      By using a conservative estimate for the head, and always processing all
      outstanding requests first, we prevent a race condition between using
      the estimate and direct reads of I915_RING_HEAD which could result in
      the value of the head going backwards, and the tail overflowing once
      again. We are also careful to mark any request that we skip over in
      order to free space in ring as consumed which provides a
      self-consistency check.
      
      Given sufficient abuse, such as a set of unthrottled GPU bound
      cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
      Sandy Bridge (i5-2520m):
        firefox-paintball  18927ms -> 15646ms: 1.21x speedup
        firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
      which is a mild consolation for the performance those traces achieved from
      exploiting the buggy autoreported head.
      
      v2: Add a few more comments and make request->tail a conservative
      estimate as suggested by Daniel Vetter.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: resolve conflicts with retirement defering and the lack of
      the autoreport head removal (that will go in through -fixes).]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a71d8d94
  12. 14 2月, 2012 2 次提交
  13. 13 2月, 2012 1 次提交
    • D
      drm/i915: fixup seqno allocation logic for lazy_request · 53d227f2
      Daniel Vetter 提交于
      Currently we reserve seqnos only when we emit the request to the ring
      (by bumping dev_priv->next_seqno), but start using it much earlier for
      ring->oustanding_lazy_request. When 2 threads compete for the gpu and
      run on two different rings (e.g. ddx on blitter vs. compositor)
      hilarity ensued, especially when we get constantly interrupted while
      reserving buffers.
      
      Breakage seems to have been introduced in
      
      commit 6f392d54
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sat Aug 7 11:01:22 2010 +0100
      
          drm/i915: Use a common seqno for all rings.
      
      This patch fixes up the seqno reservation logic by moving it into
      i915_gem_next_request_seqno. The ring->add_request functions now
      superflously still return the new seqno through a pointer, that will
      be refactored in the next patch.
      
      Note that with this change we now unconditionally allocate a seqno,
      even when ->add_request might fail because the rings are full and the
      gpu died. But this does not open up a new can of worms because we can
      already leave behind an outstanding_request_seqno if e.g. the caller
      gets interrupted with a signal while stalling for the gpu in the
      eviciton paths. And with the bugfix we only ever have one seqno
      allocated per ring (and only that ring), so there are no ordering
      issues with multiple outstanding seqnos on the same ring.
      
      v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
      clear that we only have one seqno counter for all rings. Suggested by
      Chris Wilson.
      
      v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
      instead of ring->oustanding_lazy_request to make the follow-up
      refactoring more clearly correct. Also improve the commit message
      with issues discussed on irc.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
      Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      53d227f2
  14. 12 2月, 2012 1 次提交
  15. 10 2月, 2012 3 次提交
  16. 09 2月, 2012 2 次提交
    • D
      drm/i915: dump even more into the error_state · 7e3b8737
      Daniel Vetter 提交于
      Chris Wilson and me have again stared at funny error states and it's
      been pretty clear from the start that something was seriously amiss.
      The seqnos last seen by the cpu were a few hundred behind those that
      the gpu could have possibly emitted last before it died ...
      
      Chris now tracked it down (hopefully, definit verdict's still out),
      but in hindsight we'd have found the bug by simply dumping the cpu
      side tracking of the ring head and tail registers.
      
      Fix this and prevent an identical time-waster in the future.
      
      Because the hangs always involved semaphores in one way or another,
      we've tried to dump the mbox registers, but couldn't find any
      inconsistencies. Still, dump them too.
      Reviewed-and-wanted-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7e3b8737
    • D
      drm/i915: swizzling support for snb/ivb · f691e2f4
      Daniel Vetter 提交于
      We have to do this manually. Somebody had a Great Idea.
      
      I've measured speed-ups just a few percent above the noise level
      (below 5% for the best case), but no slowdows. Chris Wilson measured
      quite a bit more (10-20% above the usual snb variance) on a more
      recent and better tuned version of sna, but also recorded a few
      slow-downs on benchmarks know for uglier amounts of snb-induced
      variance.
      
      v2: Incorporate Ben Widawsky's preliminary review comments and
      elaborate a bit about the performance impact in the changelog.
      
      v3: Add a comment as to why we don't need to check the 3rd memory
      channel.
      
      v4: Fixup whitespace.
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NEric Anholt <eric@anholt.net>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f691e2f4
  17. 30 1月, 2012 4 次提交
  18. 26 1月, 2012 1 次提交
  19. 20 1月, 2012 1 次提交
    • D
      drm/i915: protect force_wake_(get|put) with the gt_lock · 9f1f46a4
      Daniel Vetter 提交于
      The problem this patch solves is that the forcewake accounting
      necessary for register reads is protected by dev->struct_mutex. But the
      hangcheck and error_capture code need to access registers without
      grabbing this mutex because we hold it while waiting for the gpu.
      So a new lock is required. Because currently the error_state capture
      is called from the error irq handler and the hangcheck code runs from
      a timer, it needs to be an irqsafe spinlock (note that the registers
      used by the irq handler (neglecting the error handling part) only uses
      registers that don't need the forcewake dance).
      
      We could tune this down to a normal spinlock when we rework the
      error_state capture and hangcheck code to run from a workqueue.  But
      we don't have any read in a fastpath that needs forcewake, so I've
      decided to not care much about overhead.
      
      This prevents tests/gem_hangcheck_forcewake from i-g-t from killing my
      snb on recent kernels - something must have slightly changed the
      timings. On previous kernels it only trigger a WARN about the broken
      locking.
      
      v2: Drop the previous patch for the register writes.
      
      v3: Improve the commit message per Chris Wilson's suggestions.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      9f1f46a4
  20. 18 1月, 2012 1 次提交