1. 04 2月, 2014 1 次提交
  2. 10 9月, 2013 1 次提交
    • C
      drm/i915: Write RING_TAIL once per-request · 09246732
      Chris Wilson 提交于
      Ignoring the legacy DRI1 code, and a couple of special cases (to be
      discussed later), all access to the ring is mediated through requests.
      The first write to a ring will grab a seqno and mark the ring as having
      an outstanding_lazy_request. Either through explicitly adding a request
      after an execbuffer or through an implicit wait (either by the CPU or by
      a semaphore), that sequence of writes will be terminated with a request.
      So we can ellide all the intervening writes to the tail register and
      send the entire command stream to the GPU at once. This will reduce the
      number of *serialising* writes to the tail register by a factor or 3-5
      times (depending upon architecture and number of workarounds, context
      switches, etc involved). This becomes even more noticeable when the
      register write is overloaded with a number of debugging tools. The
      astute reader will wonder if it is then possible to overflow the ring
      with a single command. It is not. When we start a command sequence to
      the ring, we check for available space and issue a wait in case we have
      not. The ring wait will in this case be forced to flush the outstanding
      register write and then poll the ACTHD for sufficient space to continue.
      
      The exception to the rule where everything is inside a request are a few
      initialisation cases where we may want to write GPU commands via the CS
      before userspace wakes up and page flips.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      09246732
  3. 06 9月, 2013 1 次提交
  4. 05 9月, 2013 2 次提交
  5. 04 9月, 2013 1 次提交
  6. 23 8月, 2013 1 次提交
  7. 22 8月, 2013 1 次提交
  8. 11 7月, 2013 2 次提交
    • D
      drm/i915: unify ring irq refcounts (again) · c7113cc3
      Daniel Vetter 提交于
      With the simplified locking there's no reason any more to keep the
      refcounts seperate.
      
      v2: Readd the lost comment that ring->irq_refcount is protected by
      dev_priv->irq_lock.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c7113cc3
    • D
      drm/i915: kill dev_priv->rps.lock · 59cdb63d
      Daniel Vetter 提交于
      Now that the rps interrupt locking isn't clearly separated (at elast
      conceptually) from all the other interrupt locking having a different
      lock stopped making sense: It protects much more than just the rps
      workqueue it started out with. But with the addition of VECS the
      separation started to blurr and resulted in some more complex locking
      for the ring interrupt refcount.
      
      With this we can (again) unifiy the ringbuffer irq refcounts without
      causing a massive confusion, but that's for the next patch.
      
      v2: Explain better why the rps.lock once made sense and why no longer,
      requested by Ben.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      59cdb63d
  9. 13 6月, 2013 1 次提交
  10. 11 6月, 2013 1 次提交
    • C
      drm/i915: Don't count semaphore waits towards a stuck ring · 6274f212
      Chris Wilson 提交于
      If we detect a ring is in a valid wait for another, just let it be.
      Eventually it will either begin to progress again, or the entire system
      will come grinding to a halt and then hangcheck will fire as soon as the
      deadlock is detected.
      
      This error was foretold by Ben in
      commit 05407ff8
      Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Date:   Thu May 30 09:04:29 2013 +0300
      
          drm/i915: detect hang using per ring hangcheck_score
      
      "If ring B is waiting on ring A via semaphore, and ring A is making
      progress, albeit slowly - the hangcheck will fire. The check will
      determine that A is moving, however ring B will appear hung because
      the ACTHD doesn't move. I honestly can't say if that's actually a
      realistic problem to hit it probably implies the timeout value is too
      low."
      
      v2: Make sure we don't even incur the KICK cost whilst waiting.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65394Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6274f212
  11. 07 6月, 2013 1 次提交
  12. 03 6月, 2013 1 次提交
    • M
      drm/i915: detect hang using per ring hangcheck_score · 05407ff8
      Mika Kuoppala 提交于
      Keep track of ring seqno progress and if there are no
      progress detected, declare hang. Use actual head (acthd)
      to distinguish between ring stuck and batchbuffer looping
      situation. Stuck ring will be kicked to trigger progress.
      
      This commit adds a hard limit for batchbuffer completion time.
      If batchbuffer completion time is more than 4.5 seconds,
      the gpu will be declared hung.
      
      Review comment from Ben which nicely clarifies the semantic change:
      
      "Maybe I'm just stating the functional changes of the patch, but in case
      they were unintended here is what I see as potential issues:
      
      1. "If ring B is waiting on ring A via semaphore, and ring A is making
         progress, albeit slowly - the hangcheck will fire. The check will
         determine that A is moving, however ring B will appear hung because
         the ACTHD doesn't move. I honestly can't say if that's actually a
         realistic problem to hit it probably implies the timeout value is too
         low.
      
      2. "There's also another corner case on the kick. If the seqno = 2
         (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
         we try to kick it... we don't actually try to find out if the kick
         helped"
      
      v2: use atchd to detect stuck ring from loop (Ben Widawsky)
      
      v3: Use acthd to check when ring needs kicking.
      Declare hang on third time in order to give time for
      kick_ring to take effect.
      
      v4: Update commit msg
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Paste in Ben's review comment.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      05407ff8
  13. 01 6月, 2013 7 次提交
  14. 06 5月, 2013 1 次提交
    • C
      drm/i915: put context upon switching · 112522f6
      Chris Wilson 提交于
      In order to be notified of when the context and all of its associated
      objects is idle (for if the context maps to a ppgtt) we need a callback
      from the retire handler. We can arrange this by using the kref_get/put
      of the context for request tracking and by inserting a request to
      demarque the switch away from the old context.
      
      [Ben: fixed minor error to patch compile, AND s/last_context/from/]
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      112522f6
  15. 19 12月, 2012 2 次提交
  16. 18 12月, 2012 1 次提交
    • D
      drm/i915: Implement workaround for broken CS tlb on i830/845 · b45305fc
      Daniel Vetter 提交于
      Now that Chris Wilson demonstrated that the key for stability on early
      gen 2 is to simple _never_ exchange the physical backing storage of
      batch buffers I've tried a stab at a kernel solution. Doesn't look too
      nefarious imho, now that I don't try to be too clever for my own good
      any more.
      
      v2: After discussing the various techniques, we've decided to always blit
      batches on the suspect devices, but allow userspace to opt out of the
      kernel workaround assume full responsibility for providing coherent
      batches. The principal reason is that avoiding the blit does improve
      performance in a few key microbenchmarks and also in cairo-trace
      replays.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet:
      - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
        wrap w/a. Suggested by Chris Wilson.
      - Also add the ACTHD check from Chris Wilson for the error state
        dumping, so that we still catch batches when userspace opts out of
        the w/a.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b45305fc
  17. 06 12月, 2012 1 次提交
  18. 04 12月, 2012 1 次提交
  19. 29 11月, 2012 2 次提交
    • C
      drm/i915: Rearrange code to only have a single method for waiting upon the ring · 3e960501
      Chris Wilson 提交于
      Replace the wait for the ring to be clear with the more common wait for
      the ring to be idle. The principle advantage is one less exported
      intel_ring_wait function, and the removal of a hardcoded value.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3e960501
    • C
      drm/i915: Preallocate next seqno before touching the ring · 9d773091
      Chris Wilson 提交于
      Based on the work by Mika Kuoppala, we realised that we need to handle
      seqno wraparound prior to committing our changes to the ring. The most
      obvious point then is to grab the seqno inside intel_ring_begin(), and
      then to reuse that seqno for all ring operations until the next request.
      As intel_ring_begin() can fail, the callers must already be prepared to
      handle such failure and so we can safely add further checks.
      
      This patch looks like it should be split up into the interface
      changes and the tweaks to move seqno wrapping from the execbuffer into
      the core seqno increment. However, I found no easy way to break it into
      incremental steps without introducing further broken behaviour.
      
      v2: Mika found a silly mistake and a subtle error in the existing code;
      inside i915_gem_retire_requests() we were resetting the sync_seqno of
      the target ring based on the seqno from this ring - which are only
      related by the order of their allocation, not retirement. Hence we were
      applying the optimisation that the rings were synchronised too early,
      fortunately the only real casualty there is the handling of seqno
      wrapping.
      
      v3: Do not forget to reset the sync_seqno upon module reinitialisation,
      ala resume.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
      Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9d773091
  20. 12 11月, 2012 1 次提交
  21. 18 10月, 2012 1 次提交
    • C
      drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers · d7d4eedd
      Chris Wilson 提交于
      With the introduction of per-process GTT space, the hardware designers
      thought it wise to also limit the ability to write to MMIO space to only
      a "secure" batch buffer. The ability to rewrite registers is the only
      way to program the hardware to perform certain operations like scanline
      waits (required for tear-free windowed updates). So we either have a
      choice of adding an interface to perform those synchronized updates
      inside the kernel, or we permit certain processes the ability to write
      to the "safe" registers from within its command stream. This patch
      exposes the ability to submit a SECURE batch buffer to
      DRM_ROOT_ONLY|DRM_MASTER processes.
      
      v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
      bit (bit 13, accidentally not set). Also add a comment explaining why
      secure batches need a global gtt binding.
      
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
      [danvet: added hsw fixup.]
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d7d4eedd
  22. 10 8月, 2012 1 次提交
    • C
      drm/i915: Lazily apply the SNB+ seqno w/a · b2eadbc8
      Chris Wilson 提交于
      Avoid the forcewake overhead when simply retiring requests, as often the
      last seen seqno is good enough to satisfy the retirment process and will
      be promptly re-run in any case. Only ensure that we force the coherent
      seqno read when we are explicitly waiting upon a completion event to be
      sure that none go missing, and also for when we are reporting seqno
      values in case of error or debugging.
      
      This greatly reduces the load for userspace using the busy-ioctl to
      track active buffers, for instance halving the CPU used by X in pushing
      the pixels from a software render (flash). The effect will be even more
      magnified with userptr and so providing a zero-copy upload path in that
      instance, or in similar instances where X is simply compositing DRI
      buffers.
      
      v2: Reverse the polarity of the tachyon stream. Daniel suggested that
      'force' was too generic for the parameter name and that 'lazy_coherency'
      better encapsulated the semantics of it being an optimization and its
      purpose. Also notice that gen6_get_seqno() is only used by gen6/7
      chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
      function.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b2eadbc8
  23. 26 7月, 2012 2 次提交
  24. 20 6月, 2012 1 次提交
    • D
      drm/i915: disable flushing_list/gpu_write_list · cc889e0f
      Daniel Vetter 提交于
      This is just the minimal patch to disable all this code so that we can
      do decent amounts of QA before we rip it all out.
      
      The complicating thing is that we need to flush the gpu caches after
      the batchbuffer is emitted. Which is past the point of no return where
      execbuffer can't fail any more (otherwise we risk submitting the same
      batch multiple times).
      
      Hence we need to add a flag to track whether any caches associated
      with that ring are dirty. And emit the flush in add_request if that's
      the case.
      
      Note that this has a quite a few behaviour changes:
      - Caches get flushed/invalidated unconditionally.
      - Invalidation now happens after potential inter-ring sync.
      
      I've bantered around a bit with Chris on irc whether this fixes
      anything, and it might or might not. The only thing clear is that with
      these changes it's much easier to reason about correctness.
      
      Also rip out a lone get_next_request_seqno in the execbuffer
      retire_commands function. I've dug around and I couldn't figure out
      why that is still there, with the outstanding lazy request stuff it
      shouldn't be necessary.
      
      v2: Chris Wilson complained that I also invalidate the read caches
      when flushing after a batchbuffer. Now optimized.
      
      v3: Added some comments to explain the new flushing behaviour.
      
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      cc889e0f
  25. 14 6月, 2012 3 次提交
    • B
      drm/i915: possibly invalidate TLB before context switch · 12b0286f
      Ben Widawsky 提交于
      From http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol1_Part3.pdf
      
      [DevSNB] If Flush TLB invalidation Mode is enabled it's the driver's
      responsibility to invalidate the TLBs at least once after the previous
      context switch after any GTT mappings changed (including new GTT
      entries).  This can be done by a pipelined PIPE_CONTROL with TLB inv bit
      set immediately before MI_SET_CONTEXT.
      
      On GEN7 the invalidation mode is explicitly set, but this appears to be
      lacking for GEN6. Since I don't know the history on this, I've decided
      to dynamically read the value at ring init time, and use that value
      throughout.
      
      v2: better comment (daniel)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      12b0286f
    • B
      drm/i915: context switch implementation · e0556841
      Ben Widawsky 提交于
      Implement the context switch code as well as the interfaces to do the
      context switch. This patch also doesn't match 1:1 with the RFC patches.
      The main difference is that from Daniel's responses the last context
      object is now stored instead of the last context. This aids in allows us
      to free the context data structure, and context object independently.
      
      There is room for optimization: this code will pin the context object
      until the next context is active. The optimal way to do it is to
      actually pin the object, move it to the active list, do the context
      switch, and then unpin it. This allows the eviction code to actually
      evict the context object if needed.
      
      The context switch code is missing workarounds, they will be implemented
      in future patches.
      
      v2: actually do obj->dirty=1 in switch (daniel)
      Modified comment around above
      Remove flags to context switch (daniel)
      Move mi_set_context code to i915_gem_context.c (daniel)
      Remove seqno , use lazy request instead (daniel)
      
      v3: use i915_gem_request_next_seqno instead of
            outstanding_lazy_request (Daniel)
      remove id's from trace events (Daniel)
      Put the context BO in the instruction domain (Daniel)
      Don't unref the BO is context switch fails (Chris)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      e0556841
    • B
      drm/i915: context basic create & destroy · 40521054
      Ben Widawsky 提交于
      Invent an abstraction for a hw context which is passed around through
      the core functions. The main bit a hw context holds is the buffer object
      which backs the context. The rest of the members are just helper
      functions. Specifically the ring member, which could likely go away if
      we decide to never implement whatever other hw context support exists.
      
      Of note here is the introduction of the 64k alignment constraint for the
      BO. If contexts become heavily used, we should consider tweaking this
      down to 4k. Until the contexts are merged and tested a bit though, I
      think 64k is a nice start (based on docs).
      
      Since we don't yet switch contexts, there is really not much complexity
      here. Creation/destruction works pretty much as one would expect. An idr
      is used to generate the context id numbers which are unique per file
      descriptor.
      
      v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
      convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      40521054
  26. 20 5月, 2012 1 次提交
    • C
      drm/i915: Introduce for_each_ring() macro · b4519513
      Chris Wilson 提交于
      In many places we wish to iterate over the rings associated with the
      GPU, so refactor them to use a common macro.
      
      Along the way, there are a few code removals that should be side-effect
      free and some rearrangement which should only have a cosmetic impact,
      such as error-state.
      
      Note that this slightly changes the semantics in the hangcheck code:
      We now always cycle through all enabled rings instead of
      short-circuiting the logic.
      
      v2: Pull in a couple of suggestions from Ben and Daniel for
      intel_ring_initialized() and not removing the warning (just moving them
      to a new home, closer to the error).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Added note to commit message about the small behaviour
      change, suggested by Ben Widawsky.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b4519513
  27. 03 5月, 2012 1 次提交