1. 20 2月, 2013 2 次提交
  2. 23 1月, 2013 2 次提交
  3. 22 1月, 2013 1 次提交
  4. 20 1月, 2013 2 次提交
  5. 18 1月, 2013 1 次提交
  6. 19 12月, 2012 2 次提交
  7. 18 12月, 2012 1 次提交
    • D
      drm/i915: Implement workaround for broken CS tlb on i830/845 · b45305fc
      Daniel Vetter 提交于
      Now that Chris Wilson demonstrated that the key for stability on early
      gen 2 is to simple _never_ exchange the physical backing storage of
      batch buffers I've tried a stab at a kernel solution. Doesn't look too
      nefarious imho, now that I don't try to be too clever for my own good
      any more.
      
      v2: After discussing the various techniques, we've decided to always blit
      batches on the suspect devices, but allow userspace to opt out of the
      kernel workaround assume full responsibility for providing coherent
      batches. The principal reason is that avoiding the blit does improve
      performance in a few key microbenchmarks and also in cairo-trace
      replays.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet:
      - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
        wrap w/a. Suggested by Chris Wilson.
      - Also add the ACTHD check from Chris Wilson for the error state
        dumping, so that we still catch batches when userspace opts out of
        the w/a.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b45305fc
  8. 11 12月, 2012 1 次提交
  9. 06 12月, 2012 2 次提交
  10. 04 12月, 2012 1 次提交
  11. 01 12月, 2012 1 次提交
  12. 29 11月, 2012 2 次提交
    • C
      drm/i915: Rearrange code to only have a single method for waiting upon the ring · 3e960501
      Chris Wilson 提交于
      Replace the wait for the ring to be clear with the more common wait for
      the ring to be idle. The principle advantage is one less exported
      intel_ring_wait function, and the removal of a hardcoded value.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3e960501
    • C
      drm/i915: Preallocate next seqno before touching the ring · 9d773091
      Chris Wilson 提交于
      Based on the work by Mika Kuoppala, we realised that we need to handle
      seqno wraparound prior to committing our changes to the ring. The most
      obvious point then is to grab the seqno inside intel_ring_begin(), and
      then to reuse that seqno for all ring operations until the next request.
      As intel_ring_begin() can fail, the callers must already be prepared to
      handle such failure and so we can safely add further checks.
      
      This patch looks like it should be split up into the interface
      changes and the tweaks to move seqno wrapping from the execbuffer into
      the core seqno increment. However, I found no easy way to break it into
      incremental steps without introducing further broken behaviour.
      
      v2: Mika found a silly mistake and a subtle error in the existing code;
      inside i915_gem_retire_requests() we were resetting the sync_seqno of
      the target ring based on the seqno from this ring - which are only
      related by the order of their allocation, not retirement. Hence we were
      applying the optimisation that the rings were synchronised too early,
      fortunately the only real casualty there is the handling of seqno
      wrapping.
      
      v3: Do not forget to reset the sync_seqno upon module reinitialisation,
      ala resume.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
      Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9d773091
  13. 22 11月, 2012 1 次提交
  14. 16 11月, 2012 1 次提交
  15. 12 11月, 2012 4 次提交
  16. 18 10月, 2012 1 次提交
    • C
      drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers · d7d4eedd
      Chris Wilson 提交于
      With the introduction of per-process GTT space, the hardware designers
      thought it wise to also limit the ability to write to MMIO space to only
      a "secure" batch buffer. The ability to rewrite registers is the only
      way to program the hardware to perform certain operations like scanline
      waits (required for tear-free windowed updates). So we either have a
      choice of adding an interface to perform those synchronized updates
      inside the kernel, or we permit certain processes the ability to write
      to the "safe" registers from within its command stream. This patch
      exposes the ability to submit a SECURE batch buffer to
      DRM_ROOT_ONLY|DRM_MASTER processes.
      
      v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
      bit (bit 13, accidentally not set). Also add a comment explaining why
      secure batches need a global gtt binding.
      
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
      [danvet: added hsw fixup.]
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d7d4eedd
  17. 03 10月, 2012 2 次提交
  18. 20 9月, 2012 1 次提交
    • C
      drm/i915: Replace the array of pages with a scatterlist · 9da3da66
      Chris Wilson 提交于
      Rather than have multiple data structures for describing our page layout
      in conjunction with the array of pages, we can migrate all users over to
      a scatterlist.
      
      One major advantage, other than unifying the page tracking structures,
      this offers is that we replace the vmalloc'ed array (which can be up to
      a megabyte in size) with a chain of individual pages which helps reduce
      memory pressure.
      
      The disadvantage is that we then do not have a simple array to iterate,
      or to access randomly. The common case for this is in the relocation
      processing, which will typically fit within a single scatterlist page
      and so be almost the same cost as the simple array. For iterating over
      the array, the extra function call could be optimised away, but in
      reality is an insignificant cost of either binding the pages, or
      performing the pwrite/pread.
      
      v2: Fix drm_clflush_sg() to not invoke wbinvd as well! And fix the
      trivial compile error from rebasing.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      9da3da66
  19. 03 9月, 2012 3 次提交
    • P
      drm/i915: add workarounds to gen7_render_ring_flush · f3987631
      Paulo Zanoni 提交于
      From Bspec, Vol 2a, Section 1.9.3.4 "PIPE_CONTROL", intro section
      detailing the various workarounds:
      
      "[DevIVB {W/A}, DevHSW {W/A}]: Pipe_control with CS-stall bit
      set must be issued before a pipe-control command that has the State
      Cache Invalidate bit set."
      
      Note that public Bspec has different numbering, it's Vol2Part1,
      Section 1.10.4.1 "PIPE_CONTROL" there.
      
      There's also a second workaround for the PIPE_CONTROL command itself:
      
      "[DevIVB, DevVLV, DevHSW] {WA}: Every 4th PIPE_CONTROL command, not
      counting the PIPE_CONTROL with only read-cache-invalidate bit(s) set,
      must have a CS_STALL bit set"
      
      For simplicity we simply set the CS_STALL bit on every pipe_control on
      gen7+
      
      Note that this massively helps on some hsw machines, together with the
      following patch to unconditionally set the CS_STALL bit on every
      pipe_control it prevents a gpu hang every few seconds.
      
      This is a regression that has been introduced in the pipe_control
      cleanup:
      
      commit 6c6cf5aa
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Jul 20 18:02:28 2012 +0100
      
          drm/i915: Only apply the SNB pipe control w/a to gen6
      
      It looks like the massive snb pipe_control workaround also papered
      over any issues on ivb and hsw.
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      [danvet: squashed both workarounds together, pimped commit message
      with Bsepc citations, regression commit citation and changed the
      comment in the code a bit to clarify that we unconditionally set
      CS_STALL to avoid being hurt by trying to be clever.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f3987631
    • P
      drm/i915: add workarounds directly to gen6_render_ring_flush · b3111509
      Paulo Zanoni 提交于
      Since gen 7+ now run the new gen7_render_ring_flush function.
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b3111509
    • P
      drm/i915: add gen7_render_ring_flush · 4772eaeb
      Paulo Zanoni 提交于
      For now, just a copy of gen6_render_ring_flush. Different gens have
      different workarounds, so we want different functions.
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      4772eaeb
  20. 24 8月, 2012 1 次提交
  21. 14 8月, 2012 1 次提交
  22. 10 8月, 2012 1 次提交
    • C
      drm/i915: Lazily apply the SNB+ seqno w/a · b2eadbc8
      Chris Wilson 提交于
      Avoid the forcewake overhead when simply retiring requests, as often the
      last seen seqno is good enough to satisfy the retirment process and will
      be promptly re-run in any case. Only ensure that we force the coherent
      seqno read when we are explicitly waiting upon a completion event to be
      sure that none go missing, and also for when we are reporting seqno
      values in case of error or debugging.
      
      This greatly reduces the load for userspace using the busy-ioctl to
      track active buffers, for instance halving the CPU used by X in pushing
      the pixels from a software render (flash). The effect will be even more
      magnified with userptr and so providing a zero-copy upload path in that
      instance, or in similar instances where X is simply compositing DRI
      buffers.
      
      v2: Reverse the polarity of the tachyon stream. Daniel suggested that
      'force' was too generic for the parameter name and that 'lazy_coherency'
      better encapsulated the semantics of it being an optimization and its
      purpose. Also notice that gen6_get_seqno() is only used by gen6/7
      chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
      function.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b2eadbc8
  23. 08 8月, 2012 2 次提交
  24. 26 7月, 2012 3 次提交
  25. 20 7月, 2012 1 次提交