1. 03 4月, 2014 2 次提交
    • C
      drm/i915: Move all ring resets before setting the HWS page · 9991ae78
      Chris Wilson 提交于
      In commit a51435a3
      Author: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
      Date:   Wed Mar 12 16:39:40 2014 +0530
      
          drm/i915: disable rings before HW status page setup
      
      we reordered stopping the rings to do so before we set the HWS register.
      However, there is an extra workaround for g45 to reset the rings twice,
      and for consistency we should apply that workaround before setting the
      HWS to be sure that the rings are truly stopped.
      
      Cc: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9991ae78
    • B
      drm/i915: Invariably invalidate before ctx switch · 057f6a8a
      Ben Widawsky 提交于
      We have been setting the bit which was originally BIOS dependent since:
      commit f05bb0c7
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sun Jan 20 16:33:32 2013 +0000
      
          drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits
      
      Therefore, we do not need to try to figure it out dynamically and we can
      just always invalidate the TLBs.
      
      It's a partial revert of:
      commit 12b0286f
      Author: Ben Widawsky <ben@bwidawsk.net>
      Date:   Mon Jun 4 14:42:50 2012 -0700
      
          drm/i915: possibly invalidate TLB before context switch
      
      The original commit attempted to only invalidate when necessary
      (very much a relic from the old days). Now, we can just always invalidate.
      
      I guess the old TODO still exists. Since we seem to have abandoned ILK
      contexts however, there isn't much point in even remembering.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      057f6a8a
  2. 29 3月, 2014 1 次提交
    • C
      drm/i915: Broadwell expands ACTHD to 64bit · 50877445
      Chris Wilson 提交于
      As Broadwell has an increased virtual address size, it requires more
      than 32 bits to store offsets into its address space. This includes the
      debug registers to track the current HEAD of the individual rings, which
      may be anywhere within the per-process address spaces. In order to find
      the full location, we need to read the high bits from a second register.
      We then also need to expand our storage to keep track of the larger
      address.
      
      v2: Carefully read the two registers to catch wraparound between
          the reads.
      v3: Use a WARN_ON rather than loop indefinitely on an unstable
          register read.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ben Widawsky <benjamin.widawsky@intel.com>
      Cc: Timo Aaltonen <tjaalton@ubuntu.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Drop spurious hunk which conflicted.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      50877445
  3. 12 3月, 2014 1 次提交
  4. 08 3月, 2014 1 次提交
    • B
      drm/i915: Implement command buffer parsing logic · 351e3db2
      Brad Volkin 提交于
      The command parser scans batch buffers submitted via execbuffer ioctls before
      the driver submits them to hardware. At a high level, it looks for several
      things:
      
      1) Commands which are explicitly defined as privileged or which should only be
         used by the kernel driver. The parser generally rejects such commands, with
         the provision that it may allow some from the drm master process.
      2) Commands which access registers. To support correct/enhanced userspace
         functionality, particularly certain OpenGL extensions, the parser provides a
         whitelist of registers which userspace may safely access (for both normal and
         drm master processes).
      3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
         parser always rejects such commands.
      
      See the overview comment in the source for more details.
      
      This patch only implements the logic. Subsequent patches will build the tables
      that drive the parser.
      
      v2: Don't set the secure bit if the parser succeeds
      Fail harder during init
      Makefile cleanup
      Kerneldoc cleanup
      Clarify module param description
      Convert ints to bools in a few places
      Move client/subclient defs to i915_reg.h
      Remove the bits_count field
      
      OTC-Tracker: AXIA-4631
      Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-by: NJani Nikula <jani.nikula@intel.com>
      [danvet: Appease checkpatch.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      351e3db2
  5. 12 2月, 2014 1 次提交
  6. 04 2月, 2014 1 次提交
  7. 10 9月, 2013 1 次提交
    • C
      drm/i915: Write RING_TAIL once per-request · 09246732
      Chris Wilson 提交于
      Ignoring the legacy DRI1 code, and a couple of special cases (to be
      discussed later), all access to the ring is mediated through requests.
      The first write to a ring will grab a seqno and mark the ring as having
      an outstanding_lazy_request. Either through explicitly adding a request
      after an execbuffer or through an implicit wait (either by the CPU or by
      a semaphore), that sequence of writes will be terminated with a request.
      So we can ellide all the intervening writes to the tail register and
      send the entire command stream to the GPU at once. This will reduce the
      number of *serialising* writes to the tail register by a factor or 3-5
      times (depending upon architecture and number of workarounds, context
      switches, etc involved). This becomes even more noticeable when the
      register write is overloaded with a number of debugging tools. The
      astute reader will wonder if it is then possible to overflow the ring
      with a single command. It is not. When we start a command sequence to
      the ring, we check for available space and issue a wait in case we have
      not. The ring wait will in this case be forced to flush the outstanding
      register write and then poll the ACTHD for sufficient space to continue.
      
      The exception to the rule where everything is inside a request are a few
      initialisation cases where we may want to write GPU commands via the CS
      before userspace wakes up and page flips.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      09246732
  8. 06 9月, 2013 1 次提交
  9. 05 9月, 2013 2 次提交
  10. 04 9月, 2013 1 次提交
  11. 23 8月, 2013 1 次提交
  12. 22 8月, 2013 1 次提交
  13. 11 7月, 2013 2 次提交
    • D
      drm/i915: unify ring irq refcounts (again) · c7113cc3
      Daniel Vetter 提交于
      With the simplified locking there's no reason any more to keep the
      refcounts seperate.
      
      v2: Readd the lost comment that ring->irq_refcount is protected by
      dev_priv->irq_lock.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      c7113cc3
    • D
      drm/i915: kill dev_priv->rps.lock · 59cdb63d
      Daniel Vetter 提交于
      Now that the rps interrupt locking isn't clearly separated (at elast
      conceptually) from all the other interrupt locking having a different
      lock stopped making sense: It protects much more than just the rps
      workqueue it started out with. But with the addition of VECS the
      separation started to blurr and resulted in some more complex locking
      for the ring interrupt refcount.
      
      With this we can (again) unifiy the ringbuffer irq refcounts without
      causing a massive confusion, but that's for the next patch.
      
      v2: Explain better why the rps.lock once made sense and why no longer,
      requested by Ben.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      59cdb63d
  14. 13 6月, 2013 1 次提交
  15. 11 6月, 2013 1 次提交
    • C
      drm/i915: Don't count semaphore waits towards a stuck ring · 6274f212
      Chris Wilson 提交于
      If we detect a ring is in a valid wait for another, just let it be.
      Eventually it will either begin to progress again, or the entire system
      will come grinding to a halt and then hangcheck will fire as soon as the
      deadlock is detected.
      
      This error was foretold by Ben in
      commit 05407ff8
      Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Date:   Thu May 30 09:04:29 2013 +0300
      
          drm/i915: detect hang using per ring hangcheck_score
      
      "If ring B is waiting on ring A via semaphore, and ring A is making
      progress, albeit slowly - the hangcheck will fire. The check will
      determine that A is moving, however ring B will appear hung because
      the ACTHD doesn't move. I honestly can't say if that's actually a
      realistic problem to hit it probably implies the timeout value is too
      low."
      
      v2: Make sure we don't even incur the KICK cost whilst waiting.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65394Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6274f212
  16. 07 6月, 2013 1 次提交
  17. 03 6月, 2013 1 次提交
    • M
      drm/i915: detect hang using per ring hangcheck_score · 05407ff8
      Mika Kuoppala 提交于
      Keep track of ring seqno progress and if there are no
      progress detected, declare hang. Use actual head (acthd)
      to distinguish between ring stuck and batchbuffer looping
      situation. Stuck ring will be kicked to trigger progress.
      
      This commit adds a hard limit for batchbuffer completion time.
      If batchbuffer completion time is more than 4.5 seconds,
      the gpu will be declared hung.
      
      Review comment from Ben which nicely clarifies the semantic change:
      
      "Maybe I'm just stating the functional changes of the patch, but in case
      they were unintended here is what I see as potential issues:
      
      1. "If ring B is waiting on ring A via semaphore, and ring A is making
         progress, albeit slowly - the hangcheck will fire. The check will
         determine that A is moving, however ring B will appear hung because
         the ACTHD doesn't move. I honestly can't say if that's actually a
         realistic problem to hit it probably implies the timeout value is too
         low.
      
      2. "There's also another corner case on the kick. If the seqno = 2
         (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
         we try to kick it... we don't actually try to find out if the kick
         helped"
      
      v2: use atchd to detect stuck ring from loop (Ben Widawsky)
      
      v3: Use acthd to check when ring needs kicking.
      Declare hang on third time in order to give time for
      kick_ring to take effect.
      
      v4: Update commit msg
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Paste in Ben's review comment.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      05407ff8
  18. 01 6月, 2013 7 次提交
  19. 06 5月, 2013 1 次提交
    • C
      drm/i915: put context upon switching · 112522f6
      Chris Wilson 提交于
      In order to be notified of when the context and all of its associated
      objects is idle (for if the context maps to a ppgtt) we need a callback
      from the retire handler. We can arrange this by using the kref_get/put
      of the context for request tracking and by inserting a request to
      demarque the switch away from the old context.
      
      [Ben: fixed minor error to patch compile, AND s/last_context/from/]
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      112522f6
  20. 19 12月, 2012 2 次提交
  21. 18 12月, 2012 1 次提交
    • D
      drm/i915: Implement workaround for broken CS tlb on i830/845 · b45305fc
      Daniel Vetter 提交于
      Now that Chris Wilson demonstrated that the key for stability on early
      gen 2 is to simple _never_ exchange the physical backing storage of
      batch buffers I've tried a stab at a kernel solution. Doesn't look too
      nefarious imho, now that I don't try to be too clever for my own good
      any more.
      
      v2: After discussing the various techniques, we've decided to always blit
      batches on the suspect devices, but allow userspace to opt out of the
      kernel workaround assume full responsibility for providing coherent
      batches. The principal reason is that avoiding the blit does improve
      performance in a few key microbenchmarks and also in cairo-trace
      replays.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet:
      - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
        wrap w/a. Suggested by Chris Wilson.
      - Also add the ACTHD check from Chris Wilson for the error state
        dumping, so that we still catch batches when userspace opts out of
        the w/a.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b45305fc
  22. 06 12月, 2012 1 次提交
  23. 04 12月, 2012 1 次提交
  24. 29 11月, 2012 2 次提交
    • C
      drm/i915: Rearrange code to only have a single method for waiting upon the ring · 3e960501
      Chris Wilson 提交于
      Replace the wait for the ring to be clear with the more common wait for
      the ring to be idle. The principle advantage is one less exported
      intel_ring_wait function, and the removal of a hardcoded value.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3e960501
    • C
      drm/i915: Preallocate next seqno before touching the ring · 9d773091
      Chris Wilson 提交于
      Based on the work by Mika Kuoppala, we realised that we need to handle
      seqno wraparound prior to committing our changes to the ring. The most
      obvious point then is to grab the seqno inside intel_ring_begin(), and
      then to reuse that seqno for all ring operations until the next request.
      As intel_ring_begin() can fail, the callers must already be prepared to
      handle such failure and so we can safely add further checks.
      
      This patch looks like it should be split up into the interface
      changes and the tweaks to move seqno wrapping from the execbuffer into
      the core seqno increment. However, I found no easy way to break it into
      incremental steps without introducing further broken behaviour.
      
      v2: Mika found a silly mistake and a subtle error in the existing code;
      inside i915_gem_retire_requests() we were resetting the sync_seqno of
      the target ring based on the seqno from this ring - which are only
      related by the order of their allocation, not retirement. Hence we were
      applying the optimisation that the rings were synchronised too early,
      fortunately the only real casualty there is the handling of seqno
      wrapping.
      
      v3: Do not forget to reset the sync_seqno upon module reinitialisation,
      ala resume.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
      Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9d773091
  25. 12 11月, 2012 1 次提交
  26. 18 10月, 2012 1 次提交
    • C
      drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers · d7d4eedd
      Chris Wilson 提交于
      With the introduction of per-process GTT space, the hardware designers
      thought it wise to also limit the ability to write to MMIO space to only
      a "secure" batch buffer. The ability to rewrite registers is the only
      way to program the hardware to perform certain operations like scanline
      waits (required for tear-free windowed updates). So we either have a
      choice of adding an interface to perform those synchronized updates
      inside the kernel, or we permit certain processes the ability to write
      to the "safe" registers from within its command stream. This patch
      exposes the ability to submit a SECURE batch buffer to
      DRM_ROOT_ONLY|DRM_MASTER processes.
      
      v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
      bit (bit 13, accidentally not set). Also add a comment explaining why
      secure batches need a global gtt binding.
      
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
      [danvet: added hsw fixup.]
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d7d4eedd
  27. 10 8月, 2012 1 次提交
    • C
      drm/i915: Lazily apply the SNB+ seqno w/a · b2eadbc8
      Chris Wilson 提交于
      Avoid the forcewake overhead when simply retiring requests, as often the
      last seen seqno is good enough to satisfy the retirment process and will
      be promptly re-run in any case. Only ensure that we force the coherent
      seqno read when we are explicitly waiting upon a completion event to be
      sure that none go missing, and also for when we are reporting seqno
      values in case of error or debugging.
      
      This greatly reduces the load for userspace using the busy-ioctl to
      track active buffers, for instance halving the CPU used by X in pushing
      the pixels from a software render (flash). The effect will be even more
      magnified with userptr and so providing a zero-copy upload path in that
      instance, or in similar instances where X is simply compositing DRI
      buffers.
      
      v2: Reverse the polarity of the tachyon stream. Daniel suggested that
      'force' was too generic for the parameter name and that 'lazy_coherency'
      better encapsulated the semantics of it being an optimization and its
      purpose. Also notice that gen6_get_seqno() is only used by gen6/7
      chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
      function.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b2eadbc8
  28. 26 7月, 2012 2 次提交