1. 07 6月, 2013 1 次提交
  2. 03 6月, 2013 1 次提交
    • M
      drm/i915: detect hang using per ring hangcheck_score · 05407ff8
      Mika Kuoppala 提交于
      Keep track of ring seqno progress and if there are no
      progress detected, declare hang. Use actual head (acthd)
      to distinguish between ring stuck and batchbuffer looping
      situation. Stuck ring will be kicked to trigger progress.
      
      This commit adds a hard limit for batchbuffer completion time.
      If batchbuffer completion time is more than 4.5 seconds,
      the gpu will be declared hung.
      
      Review comment from Ben which nicely clarifies the semantic change:
      
      "Maybe I'm just stating the functional changes of the patch, but in case
      they were unintended here is what I see as potential issues:
      
      1. "If ring B is waiting on ring A via semaphore, and ring A is making
         progress, albeit slowly - the hangcheck will fire. The check will
         determine that A is moving, however ring B will appear hung because
         the ACTHD doesn't move. I honestly can't say if that's actually a
         realistic problem to hit it probably implies the timeout value is too
         low.
      
      2. "There's also another corner case on the kick. If the seqno = 2
         (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
         we try to kick it... we don't actually try to find out if the kick
         helped"
      
      v2: use atchd to detect stuck ring from loop (Ben Widawsky)
      
      v3: Use acthd to check when ring needs kicking.
      Declare hang on third time in order to give time for
      kick_ring to take effect.
      
      v4: Update commit msg
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Paste in Ben's review comment.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      05407ff8
  3. 01 6月, 2013 7 次提交
  4. 06 5月, 2013 1 次提交
    • C
      drm/i915: put context upon switching · 112522f6
      Chris Wilson 提交于
      In order to be notified of when the context and all of its associated
      objects is idle (for if the context maps to a ppgtt) we need a callback
      from the retire handler. We can arrange this by using the kref_get/put
      of the context for request tracking and by inserting a request to
      demarque the switch away from the old context.
      
      [Ben: fixed minor error to patch compile, AND s/last_context/from/]
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      112522f6
  5. 19 12月, 2012 2 次提交
  6. 18 12月, 2012 1 次提交
    • D
      drm/i915: Implement workaround for broken CS tlb on i830/845 · b45305fc
      Daniel Vetter 提交于
      Now that Chris Wilson demonstrated that the key for stability on early
      gen 2 is to simple _never_ exchange the physical backing storage of
      batch buffers I've tried a stab at a kernel solution. Doesn't look too
      nefarious imho, now that I don't try to be too clever for my own good
      any more.
      
      v2: After discussing the various techniques, we've decided to always blit
      batches on the suspect devices, but allow userspace to opt out of the
      kernel workaround assume full responsibility for providing coherent
      batches. The principal reason is that avoiding the blit does improve
      performance in a few key microbenchmarks and also in cairo-trace
      replays.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet:
      - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
        wrap w/a. Suggested by Chris Wilson.
      - Also add the ACTHD check from Chris Wilson for the error state
        dumping, so that we still catch batches when userspace opts out of
        the w/a.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b45305fc
  7. 06 12月, 2012 1 次提交
  8. 04 12月, 2012 1 次提交
  9. 29 11月, 2012 2 次提交
    • C
      drm/i915: Rearrange code to only have a single method for waiting upon the ring · 3e960501
      Chris Wilson 提交于
      Replace the wait for the ring to be clear with the more common wait for
      the ring to be idle. The principle advantage is one less exported
      intel_ring_wait function, and the removal of a hardcoded value.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3e960501
    • C
      drm/i915: Preallocate next seqno before touching the ring · 9d773091
      Chris Wilson 提交于
      Based on the work by Mika Kuoppala, we realised that we need to handle
      seqno wraparound prior to committing our changes to the ring. The most
      obvious point then is to grab the seqno inside intel_ring_begin(), and
      then to reuse that seqno for all ring operations until the next request.
      As intel_ring_begin() can fail, the callers must already be prepared to
      handle such failure and so we can safely add further checks.
      
      This patch looks like it should be split up into the interface
      changes and the tweaks to move seqno wrapping from the execbuffer into
      the core seqno increment. However, I found no easy way to break it into
      incremental steps without introducing further broken behaviour.
      
      v2: Mika found a silly mistake and a subtle error in the existing code;
      inside i915_gem_retire_requests() we were resetting the sync_seqno of
      the target ring based on the seqno from this ring - which are only
      related by the order of their allocation, not retirement. Hence we were
      applying the optimisation that the rings were synchronised too early,
      fortunately the only real casualty there is the handling of seqno
      wrapping.
      
      v3: Do not forget to reset the sync_seqno upon module reinitialisation,
      ala resume.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
      Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9d773091
  10. 12 11月, 2012 1 次提交
  11. 18 10月, 2012 1 次提交
    • C
      drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers · d7d4eedd
      Chris Wilson 提交于
      With the introduction of per-process GTT space, the hardware designers
      thought it wise to also limit the ability to write to MMIO space to only
      a "secure" batch buffer. The ability to rewrite registers is the only
      way to program the hardware to perform certain operations like scanline
      waits (required for tear-free windowed updates). So we either have a
      choice of adding an interface to perform those synchronized updates
      inside the kernel, or we permit certain processes the ability to write
      to the "safe" registers from within its command stream. This patch
      exposes the ability to submit a SECURE batch buffer to
      DRM_ROOT_ONLY|DRM_MASTER processes.
      
      v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
      bit (bit 13, accidentally not set). Also add a comment explaining why
      secure batches need a global gtt binding.
      
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
      [danvet: added hsw fixup.]
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d7d4eedd
  12. 10 8月, 2012 1 次提交
    • C
      drm/i915: Lazily apply the SNB+ seqno w/a · b2eadbc8
      Chris Wilson 提交于
      Avoid the forcewake overhead when simply retiring requests, as often the
      last seen seqno is good enough to satisfy the retirment process and will
      be promptly re-run in any case. Only ensure that we force the coherent
      seqno read when we are explicitly waiting upon a completion event to be
      sure that none go missing, and also for when we are reporting seqno
      values in case of error or debugging.
      
      This greatly reduces the load for userspace using the busy-ioctl to
      track active buffers, for instance halving the CPU used by X in pushing
      the pixels from a software render (flash). The effect will be even more
      magnified with userptr and so providing a zero-copy upload path in that
      instance, or in similar instances where X is simply compositing DRI
      buffers.
      
      v2: Reverse the polarity of the tachyon stream. Daniel suggested that
      'force' was too generic for the parameter name and that 'lazy_coherency'
      better encapsulated the semantics of it being an optimization and its
      purpose. Also notice that gen6_get_seqno() is only used by gen6/7
      chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
      function.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b2eadbc8
  13. 26 7月, 2012 2 次提交
  14. 20 6月, 2012 1 次提交
    • D
      drm/i915: disable flushing_list/gpu_write_list · cc889e0f
      Daniel Vetter 提交于
      This is just the minimal patch to disable all this code so that we can
      do decent amounts of QA before we rip it all out.
      
      The complicating thing is that we need to flush the gpu caches after
      the batchbuffer is emitted. Which is past the point of no return where
      execbuffer can't fail any more (otherwise we risk submitting the same
      batch multiple times).
      
      Hence we need to add a flag to track whether any caches associated
      with that ring are dirty. And emit the flush in add_request if that's
      the case.
      
      Note that this has a quite a few behaviour changes:
      - Caches get flushed/invalidated unconditionally.
      - Invalidation now happens after potential inter-ring sync.
      
      I've bantered around a bit with Chris on irc whether this fixes
      anything, and it might or might not. The only thing clear is that with
      these changes it's much easier to reason about correctness.
      
      Also rip out a lone get_next_request_seqno in the execbuffer
      retire_commands function. I've dug around and I couldn't figure out
      why that is still there, with the outstanding lazy request stuff it
      shouldn't be necessary.
      
      v2: Chris Wilson complained that I also invalidate the read caches
      when flushing after a batchbuffer. Now optimized.
      
      v3: Added some comments to explain the new flushing behaviour.
      
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      cc889e0f
  15. 14 6月, 2012 3 次提交
    • B
      drm/i915: possibly invalidate TLB before context switch · 12b0286f
      Ben Widawsky 提交于
      From http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol1_Part3.pdf
      
      [DevSNB] If Flush TLB invalidation Mode is enabled it's the driver's
      responsibility to invalidate the TLBs at least once after the previous
      context switch after any GTT mappings changed (including new GTT
      entries).  This can be done by a pipelined PIPE_CONTROL with TLB inv bit
      set immediately before MI_SET_CONTEXT.
      
      On GEN7 the invalidation mode is explicitly set, but this appears to be
      lacking for GEN6. Since I don't know the history on this, I've decided
      to dynamically read the value at ring init time, and use that value
      throughout.
      
      v2: better comment (daniel)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      12b0286f
    • B
      drm/i915: context switch implementation · e0556841
      Ben Widawsky 提交于
      Implement the context switch code as well as the interfaces to do the
      context switch. This patch also doesn't match 1:1 with the RFC patches.
      The main difference is that from Daniel's responses the last context
      object is now stored instead of the last context. This aids in allows us
      to free the context data structure, and context object independently.
      
      There is room for optimization: this code will pin the context object
      until the next context is active. The optimal way to do it is to
      actually pin the object, move it to the active list, do the context
      switch, and then unpin it. This allows the eviction code to actually
      evict the context object if needed.
      
      The context switch code is missing workarounds, they will be implemented
      in future patches.
      
      v2: actually do obj->dirty=1 in switch (daniel)
      Modified comment around above
      Remove flags to context switch (daniel)
      Move mi_set_context code to i915_gem_context.c (daniel)
      Remove seqno , use lazy request instead (daniel)
      
      v3: use i915_gem_request_next_seqno instead of
            outstanding_lazy_request (Daniel)
      remove id's from trace events (Daniel)
      Put the context BO in the instruction domain (Daniel)
      Don't unref the BO is context switch fails (Chris)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      e0556841
    • B
      drm/i915: context basic create & destroy · 40521054
      Ben Widawsky 提交于
      Invent an abstraction for a hw context which is passed around through
      the core functions. The main bit a hw context holds is the buffer object
      which backs the context. The rest of the members are just helper
      functions. Specifically the ring member, which could likely go away if
      we decide to never implement whatever other hw context support exists.
      
      Of note here is the introduction of the 64k alignment constraint for the
      BO. If contexts become heavily used, we should consider tweaking this
      down to 4k. Until the contexts are merged and tested a bit though, I
      think 64k is a nice start (based on docs).
      
      Since we don't yet switch contexts, there is really not much complexity
      here. Creation/destruction works pretty much as one would expect. An idr
      is used to generate the context id numbers which are unique per file
      descriptor.
      
      v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
      convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      40521054
  16. 20 5月, 2012 1 次提交
    • C
      drm/i915: Introduce for_each_ring() macro · b4519513
      Chris Wilson 提交于
      In many places we wish to iterate over the rings associated with the
      GPU, so refactor them to use a common macro.
      
      Along the way, there are a few code removals that should be side-effect
      free and some rearrangement which should only have a cosmetic impact,
      such as error-state.
      
      Note that this slightly changes the semantics in the hangcheck code:
      We now always cycle through all enabled rings instead of
      short-circuiting the logic.
      
      v2: Pull in a couple of suggestions from Ben and Daniel for
      intel_ring_initialized() and not removing the warning (just moving them
      to a new home, closer to the error).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Added note to commit message about the small behaviour
      change, suggested by Ben Widawsky.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b4519513
  17. 03 5月, 2012 5 次提交
  18. 13 4月, 2012 1 次提交
  19. 10 4月, 2012 1 次提交
  20. 15 2月, 2012 1 次提交
    • C
      drm/i915: Record the tail at each request and use it to estimate the head · a71d8d94
      Chris Wilson 提交于
      By recording the location of every request in the ringbuffer, we know
      that in order to retire the request the GPU must have finished reading
      it and so the GPU head is now beyond the tail of the request. We can
      therefore provide a conservative estimate of where the GPU is reading
      from in order to avoid having to read back the ring buffer registers
      when polling for space upon starting a new write into the ringbuffer.
      
      A secondary effect is that this allows us to convert
      intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
      upon the single function to handle the complicated task of waiting upon
      the GPU. A necessary precaution is that we need to make that wait
      uninterruptible to match the existing conditions as all the callers of
      intel_ring_begin() have not been audited to handle ERESTARTSYS
      correctly.
      
      By using a conservative estimate for the head, and always processing all
      outstanding requests first, we prevent a race condition between using
      the estimate and direct reads of I915_RING_HEAD which could result in
      the value of the head going backwards, and the tail overflowing once
      again. We are also careful to mark any request that we skip over in
      order to free space in ring as consumed which provides a
      self-consistency check.
      
      Given sufficient abuse, such as a set of unthrottled GPU bound
      cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
      Sandy Bridge (i5-2520m):
        firefox-paintball  18927ms -> 15646ms: 1.21x speedup
        firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
      which is a mild consolation for the performance those traces achieved from
      exploiting the buggy autoreported head.
      
      v2: Add a few more comments and make request->tail a conservative
      estimate as suggested by Daniel Vetter.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: resolve conflicts with retirement defering and the lack of
      the autoreport head removal (that will go in through -fixes).]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a71d8d94
  21. 30 1月, 2012 1 次提交
  22. 22 9月, 2011 1 次提交
    • B
      drm/i915: Dumb down the semaphore logic · c8c99b0f
      Ben Widawsky 提交于
      While I think the previous code is correct, it was hard to follow and
      hard to debug. Since we already have a ring abstraction, might as well
      use it to handle the semaphore updates and compares.
      
      I don't expect this code to make semaphores better or worse, but you
      never know...
      
      v2:
      Remove magic per Keith's suggestions.
      Ran Daniel's gem_ring_sync_loop test on this.
      
      v3:
      Ignored one of Keith's suggestions.
      
      v4:
      Removed some bloat per Daniel's recommendation.
      
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Keith Packard <keithp@keithp.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      c8c99b0f
  23. 20 9月, 2011 1 次提交
  24. 13 7月, 2011 1 次提交
  25. 11 5月, 2011 1 次提交