1. 01 4月, 2012 1 次提交
  2. 01 3月, 2012 4 次提交
  3. 28 2月, 2012 1 次提交
  4. 15 2月, 2012 3 次提交
    • C
      drm/i915: Record the position of the request upon error · ee4f42b1
      Chris Wilson 提交于
      So that we can tally the request against the command sequence in the
      ringbuffer, or merely jump to the interesting locations.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ee4f42b1
    • C
      drm/i915: Record the in-flight requests at the time of a hang · 52d39a21
      Chris Wilson 提交于
      Being able to tally the list of outstanding requests with the sequence
      of commands in the ringbuffer is often useful evidence with respect to
      driver corruption.
      
      Note that since this is the umpteenth per-ring data structure to be added
      to the error state, I've coallesced the nearby loops (the ringbuffer and
      batchbuffer) into a single structure along with the list of requests.  A
      later task would be to refactor the ring register state into the same
      structure.
      
      v2: Fix pretty printing of requests so that they are parsed correctly by
      intel_error_decode and use the 0x%08x format for seqno for consistency
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      52d39a21
    • C
      drm/i915: Record the tail at each request and use it to estimate the head · a71d8d94
      Chris Wilson 提交于
      By recording the location of every request in the ringbuffer, we know
      that in order to retire the request the GPU must have finished reading
      it and so the GPU head is now beyond the tail of the request. We can
      therefore provide a conservative estimate of where the GPU is reading
      from in order to avoid having to read back the ring buffer registers
      when polling for space upon starting a new write into the ringbuffer.
      
      A secondary effect is that this allows us to convert
      intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
      upon the single function to handle the complicated task of waiting upon
      the GPU. A necessary precaution is that we need to make that wait
      uninterruptible to match the existing conditions as all the callers of
      intel_ring_begin() have not been audited to handle ERESTARTSYS
      correctly.
      
      By using a conservative estimate for the head, and always processing all
      outstanding requests first, we prevent a race condition between using
      the estimate and direct reads of I915_RING_HEAD which could result in
      the value of the head going backwards, and the tail overflowing once
      again. We are also careful to mark any request that we skip over in
      order to free space in ring as consumed which provides a
      self-consistency check.
      
      Given sufficient abuse, such as a set of unthrottled GPU bound
      cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
      Sandy Bridge (i5-2520m):
        firefox-paintball  18927ms -> 15646ms: 1.21x speedup
        firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
      which is a mild consolation for the performance those traces achieved from
      exploiting the buggy autoreported head.
      
      v2: Add a few more comments and make request->tail a conservative
      estimate as suggested by Daniel Vetter.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: resolve conflicts with retirement defering and the lack of
      the autoreport head removal (that will go in through -fixes).]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a71d8d94
  5. 14 2月, 2012 2 次提交
  6. 13 2月, 2012 1 次提交
    • D
      drm/i915: fixup seqno allocation logic for lazy_request · 53d227f2
      Daniel Vetter 提交于
      Currently we reserve seqnos only when we emit the request to the ring
      (by bumping dev_priv->next_seqno), but start using it much earlier for
      ring->oustanding_lazy_request. When 2 threads compete for the gpu and
      run on two different rings (e.g. ddx on blitter vs. compositor)
      hilarity ensued, especially when we get constantly interrupted while
      reserving buffers.
      
      Breakage seems to have been introduced in
      
      commit 6f392d54
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sat Aug 7 11:01:22 2010 +0100
      
          drm/i915: Use a common seqno for all rings.
      
      This patch fixes up the seqno reservation logic by moving it into
      i915_gem_next_request_seqno. The ring->add_request functions now
      superflously still return the new seqno through a pointer, that will
      be refactored in the next patch.
      
      Note that with this change we now unconditionally allocate a seqno,
      even when ->add_request might fail because the rings are full and the
      gpu died. But this does not open up a new can of worms because we can
      already leave behind an outstanding_request_seqno if e.g. the caller
      gets interrupted with a signal while stalling for the gpu in the
      eviciton paths. And with the bugfix we only ever have one seqno
      allocated per ring (and only that ring), so there are no ordering
      issues with multiple outstanding seqnos on the same ring.
      
      v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
      clear that we only have one seqno counter for all rings. Suggested by
      Chris Wilson.
      
      v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
      instead of ring->oustanding_lazy_request to make the follow-up
      refactoring more clearly correct. Also improve the commit message
      with issues discussed on irc.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
      Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      53d227f2
  7. 12 2月, 2012 1 次提交
  8. 10 2月, 2012 3 次提交
  9. 09 2月, 2012 2 次提交
    • D
      drm/i915: dump even more into the error_state · 7e3b8737
      Daniel Vetter 提交于
      Chris Wilson and me have again stared at funny error states and it's
      been pretty clear from the start that something was seriously amiss.
      The seqnos last seen by the cpu were a few hundred behind those that
      the gpu could have possibly emitted last before it died ...
      
      Chris now tracked it down (hopefully, definit verdict's still out),
      but in hindsight we'd have found the bug by simply dumping the cpu
      side tracking of the ring head and tail registers.
      
      Fix this and prevent an identical time-waster in the future.
      
      Because the hangs always involved semaphores in one way or another,
      we've tried to dump the mbox registers, but couldn't find any
      inconsistencies. Still, dump them too.
      Reviewed-and-wanted-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7e3b8737
    • D
      drm/i915: swizzling support for snb/ivb · f691e2f4
      Daniel Vetter 提交于
      We have to do this manually. Somebody had a Great Idea.
      
      I've measured speed-ups just a few percent above the noise level
      (below 5% for the best case), but no slowdows. Chris Wilson measured
      quite a bit more (10-20% above the usual snb variance) on a more
      recent and better tuned version of sna, but also recorded a few
      slow-downs on benchmarks know for uglier amounts of snb-induced
      variance.
      
      v2: Incorporate Ben Widawsky's preliminary review comments and
      elaborate a bit about the performance impact in the changelog.
      
      v3: Add a comment as to why we don't need to check the 3rd memory
      channel.
      
      v4: Fixup whitespace.
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NEric Anholt <eric@anholt.net>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f691e2f4
  10. 30 1月, 2012 4 次提交
  11. 26 1月, 2012 1 次提交
  12. 20 1月, 2012 1 次提交
    • D
      drm/i915: protect force_wake_(get|put) with the gt_lock · 9f1f46a4
      Daniel Vetter 提交于
      The problem this patch solves is that the forcewake accounting
      necessary for register reads is protected by dev->struct_mutex. But the
      hangcheck and error_capture code need to access registers without
      grabbing this mutex because we hold it while waiting for the gpu.
      So a new lock is required. Because currently the error_state capture
      is called from the error irq handler and the hangcheck code runs from
      a timer, it needs to be an irqsafe spinlock (note that the registers
      used by the irq handler (neglecting the error handling part) only uses
      registers that don't need the forcewake dance).
      
      We could tune this down to a normal spinlock when we rework the
      error_state capture and hangcheck code to run from a workqueue.  But
      we don't have any read in a fastpath that needs forcewake, so I've
      decided to not care much about overhead.
      
      This prevents tests/gem_hangcheck_forcewake from i-g-t from killing my
      snb on recent kernels - something must have slightly changed the
      timings. On previous kernels it only trigger a WARN about the broken
      locking.
      
      v2: Drop the previous patch for the register writes.
      
      v3: Improve the commit message per Chris Wilson's suggestions.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      9f1f46a4
  13. 18 1月, 2012 2 次提交
  14. 04 1月, 2012 2 次提交
    • J
      drm/i915: add SNB and IVB video sprite support v6 · b840d907
      Jesse Barnes 提交于
      The video sprites support various video surface formats natively and can
      handle scaling as well.  So add support for them using the new DRM core
      sprite support functions.
      
      v2: use drm specific fourcc header and defines
      v3: address Daniel's comments:
        - don't take struct mutex around register access (only needed for
          regs in the GT power well)
        - don't hold struct mutex across vblank waits
        - fix up update_plane API (pass obj instead of GTT offset)
        - add interlaced defines for sprite regs
        - drop unnecessary 'reg' variables
        - comment double buffered reg flushing
        Also fix w/h confusion when writing the scaling reg.
      v4: more fixes, address more comments from Daniel, and include Hai's fix
        - prevent divide by zero in scaling calculation (Hai Lan)
        - update to Ville's new DRM_FORMAT_* types
        - fix sprite watermark handling (calc based on CRTC size, separate
          from normal display wm)
        - remove private refcounts now that the fb cleanups handles things
      v5: add linear surface support
      v6: remove color key clearing & setting from update_plane
      
      For this version, I tested DPMS since it came up in the last review;
      DPMS off/on works ok when a video player is working under X, but for
      power saving we'll probably want to do something smarter.  I'll leave
      that for a separate patch on top.  Likewise with the refcounting/fb
      layer handling, which are really separate cleanups.
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      b840d907
    • K
      drm/i915: Clean up multi-threaded forcewake patch · c7dffff7
      Keith Packard 提交于
      We learned that the ECOBUS register was inside the GT power well, and
      so *did* need force wake to be read, so it gets removed from the list
      of 'doesn't need force wake' registers.
      
      That means the code reading ECOBUS after forcing the mt_force_wake
      function to be called needs to use I915_READ_NOTRACE; it doesn't need
      to do more force wake fun as it's already done it manually.
      
      This also adds a comment explaining why the MT forcewake testing code
      only needs to call mt_forcewake_get/put and not disable RC6 manually
      -- the ECOBUS read will return 0 if the device is in RC6 and isn't
      using MT forcewake, causing the test to work correctly.
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      c7dffff7
  15. 20 12月, 2011 1 次提交
    • D
      drm/i915: check ACTHD of all rings · 097354eb
      Daniel Vetter 提交于
      Otherwise hangcheck spuriously fires when running blitter/bsd-only
      workloads.
      
      Contrary to a similar patch by Ben Widawsky this does not check
      INSTDONE of the other rings. Chris Wilson implied that in a failure to
      detect a hang, most likely because INSTDONE was fluctuating. Thus only
      check ACTHD, which as far as I know is rather reliable. Also, blitter
      and bsd rings can't launch complex tasks from a single instruction
      (like 3D_PRIM on the render with complex or even infinite shaders).
      
      This fixes spurious gpu hang detection when running
      tests/gem_hangcheck_forcewake on snb/ivb.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      097354eb
  16. 17 12月, 2011 3 次提交
  17. 24 11月, 2011 1 次提交
  18. 11 11月, 2011 1 次提交
  19. 04 11月, 2011 1 次提交
  20. 21 10月, 2011 3 次提交
  21. 06 10月, 2011 2 次提交
    • K
      drm/i915: Move eDP panel fixed mode from dev_priv to intel_dp · d15456de
      Keith Packard 提交于
      This value doesn't come directly from the VBT, and so is rather
      specific to the particular DP output.
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d15456de
    • K
      drm/i915: Correct eDP panel power sequencing delay computations · f01eca2e
      Keith Packard 提交于
      Store the panel power sequencing delays in the dp private structure,
      rather than the global device structure. Who knows, maybe we'll get
      more than one eDP device in the future.
      
      From the eDP spec, we need the following numbers:
      
       T1 + T3	Power on to Aux Channel operation (panel_power_up_delay)
      
      		This marks how long it takes the panel to boot up and
      		get ready to receive aux channel communications.
      
       T8		Video signal to backlight on (backlight_on_delay)
      
      		Once a valid video signal is being sent to the device,
      		it can take a while before the panel is actuall
      		showing useful data. This delay allows the panel
      		to get something reasonable up before the backlight
      		is turned on.
      
       T9		Backlight off to video off (backlight_off_delay)
      
      		Turning the backlight off can take a moment, so
      		this delay makes sure there is still valid video
      		data on the screen.
      
       T10		Video off to power off (panel_power_down_delay)
      
      		Presumably this delay allows the panel to perform
      		an orderly shutdown of the display.
      
       T11 + T12	Power off to power on (panel_power_cycle_delay)
      
      		So, once you turn the panel off, you have to wait a
      		while before you can turn it back on. This delay is
      		usually the longest in the entire sequence.
      
      Neither the VBIOS source code nor the hardware documentation has a
      clear mapping between the delay values they provide and those required
      by the eDP spec. The VBIOS code actually uses two different labels for
      the delay values in the five words of the relevant VBT table.
      
      **** MORE LATER ***
      
      Look at both the current hardware register settings and the VBT
      specified panel power sequencing timings. Use the maximum of the two
      delays, to make sure things work reliably. If there is no VBT data,
      then those values will be initialized to zero, so we'll just use the
      values as programmed in the hardware. Note that the BIOS just fetches
      delays from the VBT table to place in the hardware registers, so we
      should get the same values from both places, except for rounding.
      
      VBT doesn't provide any values for T1 or T2, so we'll always just use
      the hardware value for that.
      
      The panel power up delay is thus T1 + T2 + T3, which should be
      sufficient in all cases.
      
      The panel power down delay is T1 + T2 + T12, using T1+T2 as a proxy
      for T11, which isn't available anywhere.
      
      For the backlight delays, the eDP spec says T6 + T8 is the delay from the
      end of link training to backlight on and T9 is the delay from
      backlight off until video off. The hardware provides a 'backlight on'
      delay, which I'm taking to be T6 + T8 while the VBT provides something
      called 'T7', which I'm assuming is s
      
      On the macbook air I'm testing with, this yields a power-up delay of
      over 200ms and a power-down delay of over 600ms. It all works now, but
      we're frobbing these power controls several times during mode setting,
      making the whole process take an awfully long time.
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      f01eca2e