1. 15 2月, 2012 2 次提交
    • C
      drm/i915: Record the tail at each request and use it to estimate the head · a71d8d94
      Chris Wilson 提交于
      By recording the location of every request in the ringbuffer, we know
      that in order to retire the request the GPU must have finished reading
      it and so the GPU head is now beyond the tail of the request. We can
      therefore provide a conservative estimate of where the GPU is reading
      from in order to avoid having to read back the ring buffer registers
      when polling for space upon starting a new write into the ringbuffer.
      
      A secondary effect is that this allows us to convert
      intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
      upon the single function to handle the complicated task of waiting upon
      the GPU. A necessary precaution is that we need to make that wait
      uninterruptible to match the existing conditions as all the callers of
      intel_ring_begin() have not been audited to handle ERESTARTSYS
      correctly.
      
      By using a conservative estimate for the head, and always processing all
      outstanding requests first, we prevent a race condition between using
      the estimate and direct reads of I915_RING_HEAD which could result in
      the value of the head going backwards, and the tail overflowing once
      again. We are also careful to mark any request that we skip over in
      order to free space in ring as consumed which provides a
      self-consistency check.
      
      Given sufficient abuse, such as a set of unthrottled GPU bound
      cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
      Sandy Bridge (i5-2520m):
        firefox-paintball  18927ms -> 15646ms: 1.21x speedup
        firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
      which is a mild consolation for the performance those traces achieved from
      exploiting the buggy autoreported head.
      
      v2: Add a few more comments and make request->tail a conservative
      estimate as suggested by Daniel Vetter.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: resolve conflicts with retirement defering and the lack of
      the autoreport head removal (that will go in through -fixes).]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a71d8d94
    • P
      drm/i915: add missing SDVO bits for interlaced modes on ILK · 7c26e5c6
      Paulo Zanoni 提交于
      This was pointed by Jesse Barnes. The code now seems to follow the
      specification but I don't have an SDVO device to really test this.
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7c26e5c6
  2. 14 2月, 2012 3 次提交
  3. 13 2月, 2012 4 次提交
    • D
      drm/i915: fix up locking inconsistency around gem_do_init · d3ae0810
      Daniel Vetter 提交于
      The locking in our setup and teardown paths is rather arbitrary, but
      generally we try to protect gem stuff with dev->struct_mutex. Further,
      the ums/gem ioctl to setup gem _does_ take the look. So fix up this
      benign inconsistency.
      
      Notice while reading through code.
      
      v2: Rebased on top of the ppgtt code.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d3ae0810
    • D
      drm/i915: enable forcewake voodoo also for gen6 · 99ffa162
      Daniel Vetter 提交于
      We still have reports of missed irqs even on Sandybridge with the
      HWSTAM workaround in place. Testing by the bug reporter gets rid of
      them with the forcewake voodoo and no HWSTAM writes.
      
      Because I've slightly botched the rebasing I've left out the ACTHD
      readback which is also required to get IVB working. Seems to still
      work on the tester's machine, so I think we should go with the more
      minmal approach on SNB. Especially since I've only found weak evidence
      for holding forcewake while waiting for an interrupt to arrive, but
      none for the ACTHD readback.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45332
      Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      99ffa162
    • D
      drm/i915: fixup seqno allocation logic for lazy_request · 53d227f2
      Daniel Vetter 提交于
      Currently we reserve seqnos only when we emit the request to the ring
      (by bumping dev_priv->next_seqno), but start using it much earlier for
      ring->oustanding_lazy_request. When 2 threads compete for the gpu and
      run on two different rings (e.g. ddx on blitter vs. compositor)
      hilarity ensued, especially when we get constantly interrupted while
      reserving buffers.
      
      Breakage seems to have been introduced in
      
      commit 6f392d54
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sat Aug 7 11:01:22 2010 +0100
      
          drm/i915: Use a common seqno for all rings.
      
      This patch fixes up the seqno reservation logic by moving it into
      i915_gem_next_request_seqno. The ring->add_request functions now
      superflously still return the new seqno through a pointer, that will
      be refactored in the next patch.
      
      Note that with this change we now unconditionally allocate a seqno,
      even when ->add_request might fail because the rings are full and the
      gpu died. But this does not open up a new can of worms because we can
      already leave behind an outstanding_request_seqno if e.g. the caller
      gets interrupted with a signal while stalling for the gpu in the
      eviciton paths. And with the bugfix we only ever have one seqno
      allocated per ring (and only that ring), so there are no ordering
      issues with multiple outstanding seqnos on the same ring.
      
      v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
      clear that we only have one seqno counter for all rings. Suggested by
      Chris Wilson.
      
      v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
      instead of ring->oustanding_lazy_request to make the follow-up
      refactoring more clearly correct. Also improve the commit message
      with issues discussed on irc.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
      Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      53d227f2
    • D
      drm/i915: outstanding_lazy_request is a u32 · 5391d0cf
      Daniel Vetter 提交于
      So don't assign it false, that's just confusing ... No functional
      change here.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5391d0cf
  4. 12 2月, 2012 3 次提交
  5. 11 2月, 2012 10 次提交
  6. 10 2月, 2012 5 次提交
  7. 09 2月, 2012 7 次提交
  8. 07 2月, 2012 1 次提交
  9. 01 2月, 2012 1 次提交
  10. 31 1月, 2012 4 次提交
    • D
      drm/i915: rewrite shmem_pread_slow to use copy_to_user · 8461d226
      Daniel Vetter 提交于
      Like for shmem_pwrite_slow. The only difference is that because we
      read data, we can leave the fetched cachelines in the cpu: In the case
      that the object isn't in the cpu read domain anymore, the clflush for
      the next cpu read domain invalidation will simply drop these
      cachelines.
      
      slow_shmem_bit17_copy is now ununsed, so kill it.
      
      With this patch tests/gem_mmap_gtt now actually works.
      
      v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.
      
      v3: Fixup the swizzling logic, it swizzled the wrong pages.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8461d226
    • D
      drm/i915: rewrite shmem_pwrite_slow to use copy_from_user · 8c59967c
      Daniel Vetter 提交于
      ... instead of get_user_pages, because that fails on non page-backed
      user addresses like e.g. a gtt mapping of a bo.
      
      To get there essentially copy the vfs read path into pagecache. We
      can't call that right away because we have to take care of bit17
      swizzling. To not deadlock with our own pagefault handler we need
      to completely drop struct_mutex, reducing the atomicty-guarantees
      of our userspace abi. Implications for racing with other gem ioctl:
      
      - execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
        already the risk of the pwrite call not being atomic, no degration.
      - read/write access to mmaps: already fully racy, no degration.
      - set_tiling: Calling set_tiling while reading/writing is already
        pretty much undefined, now it just got a bit worse. set_tiling is
        only called by libdrm on unused/new bos, so no problem.
      - set_domain: When changing to the gtt domain while copying (without any
        read/write access, e.g. for synchronization), we might leave unflushed
        data in the cpu caches. The clflush_object at the end of pwrite_slow
        takes care of this problem.
      - truncating of purgeable objects: the shmem_read_mapping_page call could
        reinstate backing storage for truncated objects. The check at the end
        of pwrite_slow takes care of this.
      
      v2:
      - add missing intel_gtt_chipset_flush
      - add __ to copy_from_user_swizzled as suggest by Chris Wilson.
      
      v3: Fixup bit17 swizzling, it swizzled the wrong pages.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8c59967c
    • D
      drm/i915: fall through pwrite_gtt_slow to the shmem slow path · 5c0480f2
      Daniel Vetter 提交于
      The gtt_pwrite slowpath grabs the userspace memory with
      get_user_pages. This will not work for non-page backed memory, like a
      gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
      -EFAULT in the gtt paths.
      
      Now the shmem paths have exactly the same problem, but this way we
      only need to rearrange the code in one write path.
      
      v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.
      
      v3: Make the codeflow around phys_pwrite cleara as suggested by Chris
      Wilson.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5c0480f2
    • D
      drm/i915: add debugfs file for swizzling information · ea16a3cd
      Daniel Vetter 提交于
      This will also come handy for the gen6+ swizzling support, where the
      driver is supposed to control swizzling depending upon dram
      configuration.
      
      v2: CxDRB3 are 16 bit regs! Noticed by Chris Wilson.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ea16a3cd