1. 10 4月, 2012 1 次提交
  2. 29 3月, 2012 1 次提交
  3. 27 3月, 2012 17 次提交
  4. 23 3月, 2012 2 次提交
  5. 21 3月, 2012 3 次提交
  6. 02 3月, 2012 1 次提交
  7. 28 2月, 2012 1 次提交
  8. 15 2月, 2012 1 次提交
    • C
      drm/i915: Record the tail at each request and use it to estimate the head · a71d8d94
      Chris Wilson 提交于
      By recording the location of every request in the ringbuffer, we know
      that in order to retire the request the GPU must have finished reading
      it and so the GPU head is now beyond the tail of the request. We can
      therefore provide a conservative estimate of where the GPU is reading
      from in order to avoid having to read back the ring buffer registers
      when polling for space upon starting a new write into the ringbuffer.
      
      A secondary effect is that this allows us to convert
      intel_ring_buffer_wait() to use i915_wait_request() and so consolidate
      upon the single function to handle the complicated task of waiting upon
      the GPU. A necessary precaution is that we need to make that wait
      uninterruptible to match the existing conditions as all the callers of
      intel_ring_begin() have not been audited to handle ERESTARTSYS
      correctly.
      
      By using a conservative estimate for the head, and always processing all
      outstanding requests first, we prevent a race condition between using
      the estimate and direct reads of I915_RING_HEAD which could result in
      the value of the head going backwards, and the tail overflowing once
      again. We are also careful to mark any request that we skip over in
      order to free space in ring as consumed which provides a
      self-consistency check.
      
      Given sufficient abuse, such as a set of unthrottled GPU bound
      cairo-traces, avoiding the use of I915_RING_HEAD gives a 10-20% boost on
      Sandy Bridge (i5-2520m):
        firefox-paintball  18927ms -> 15646ms: 1.21x speedup
        firefox-fishtank   12563ms -> 11278ms: 1.11x speedup
      which is a mild consolation for the performance those traces achieved from
      exploiting the buggy autoreported head.
      
      v2: Add a few more comments and make request->tail a conservative
      estimate as suggested by Daniel Vetter.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: resolve conflicts with retirement defering and the lack of
      the autoreport head removal (that will go in through -fixes).]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a71d8d94
  9. 13 2月, 2012 2 次提交
    • D
      drm/i915: fixup seqno allocation logic for lazy_request · 53d227f2
      Daniel Vetter 提交于
      Currently we reserve seqnos only when we emit the request to the ring
      (by bumping dev_priv->next_seqno), but start using it much earlier for
      ring->oustanding_lazy_request. When 2 threads compete for the gpu and
      run on two different rings (e.g. ddx on blitter vs. compositor)
      hilarity ensued, especially when we get constantly interrupted while
      reserving buffers.
      
      Breakage seems to have been introduced in
      
      commit 6f392d54
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Sat Aug 7 11:01:22 2010 +0100
      
          drm/i915: Use a common seqno for all rings.
      
      This patch fixes up the seqno reservation logic by moving it into
      i915_gem_next_request_seqno. The ring->add_request functions now
      superflously still return the new seqno through a pointer, that will
      be refactored in the next patch.
      
      Note that with this change we now unconditionally allocate a seqno,
      even when ->add_request might fail because the rings are full and the
      gpu died. But this does not open up a new can of worms because we can
      already leave behind an outstanding_request_seqno if e.g. the caller
      gets interrupted with a signal while stalling for the gpu in the
      eviciton paths. And with the bugfix we only ever have one seqno
      allocated per ring (and only that ring), so there are no ordering
      issues with multiple outstanding seqnos on the same ring.
      
      v2: Keep i915_gem_get_seqno (but move it to i915_gem.c) to make it
      clear that we only have one seqno counter for all rings. Suggested by
      Chris Wilson.
      
      v3: As suggested by Chris Wilson use i915_gem_next_request_seqno
      instead of ring->oustanding_lazy_request to make the follow-up
      refactoring more clearly correct. Also improve the commit message
      with issues discussed on irc.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45181
      Tested-by: Nicolas Kalkhof nkalkhof()at()web.de
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      53d227f2
    • D
      drm/i915: outstanding_lazy_request is a u32 · 5391d0cf
      Daniel Vetter 提交于
      So don't assign it false, that's just confusing ... No functional
      change here.
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5391d0cf
  10. 10 2月, 2012 2 次提交
  11. 09 2月, 2012 2 次提交
  12. 31 1月, 2012 4 次提交
    • D
      drm/i915: rewrite shmem_pread_slow to use copy_to_user · 8461d226
      Daniel Vetter 提交于
      Like for shmem_pwrite_slow. The only difference is that because we
      read data, we can leave the fetched cachelines in the cpu: In the case
      that the object isn't in the cpu read domain anymore, the clflush for
      the next cpu read domain invalidation will simply drop these
      cachelines.
      
      slow_shmem_bit17_copy is now ununsed, so kill it.
      
      With this patch tests/gem_mmap_gtt now actually works.
      
      v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.
      
      v3: Fixup the swizzling logic, it swizzled the wrong pages.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8461d226
    • D
      drm/i915: rewrite shmem_pwrite_slow to use copy_from_user · 8c59967c
      Daniel Vetter 提交于
      ... instead of get_user_pages, because that fails on non page-backed
      user addresses like e.g. a gtt mapping of a bo.
      
      To get there essentially copy the vfs read path into pagecache. We
      can't call that right away because we have to take care of bit17
      swizzling. To not deadlock with our own pagefault handler we need
      to completely drop struct_mutex, reducing the atomicty-guarantees
      of our userspace abi. Implications for racing with other gem ioctl:
      
      - execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
        already the risk of the pwrite call not being atomic, no degration.
      - read/write access to mmaps: already fully racy, no degration.
      - set_tiling: Calling set_tiling while reading/writing is already
        pretty much undefined, now it just got a bit worse. set_tiling is
        only called by libdrm on unused/new bos, so no problem.
      - set_domain: When changing to the gtt domain while copying (without any
        read/write access, e.g. for synchronization), we might leave unflushed
        data in the cpu caches. The clflush_object at the end of pwrite_slow
        takes care of this problem.
      - truncating of purgeable objects: the shmem_read_mapping_page call could
        reinstate backing storage for truncated objects. The check at the end
        of pwrite_slow takes care of this.
      
      v2:
      - add missing intel_gtt_chipset_flush
      - add __ to copy_from_user_swizzled as suggest by Chris Wilson.
      
      v3: Fixup bit17 swizzling, it swizzled the wrong pages.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8c59967c
    • D
      drm/i915: fall through pwrite_gtt_slow to the shmem slow path · 5c0480f2
      Daniel Vetter 提交于
      The gtt_pwrite slowpath grabs the userspace memory with
      get_user_pages. This will not work for non-page backed memory, like a
      gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
      -EFAULT in the gtt paths.
      
      Now the shmem paths have exactly the same problem, but this way we
      only need to rearrange the code in one write path.
      
      v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.
      
      v3: Make the codeflow around phys_pwrite cleara as suggested by Chris
      Wilson.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5c0480f2
    • C
      drm/i915: Remove the upper limit on the bo size for mapping into the CPU domain · 068c6ff1
      Chris Wilson 提交于
      The original intention of comparing the bo against the mappable GTT
      limits was to prevent a subsequent faulting of the bo into the GTT from
      clearing the entire GTT in vain. However, that was clearly a cut'n'paste
      mistake as a CPU mapping never binds the bo into the aperture. Whilst
      there may be some merit to limiting the maximum size of the bo to
      something that can be utilized by the GPU, that limit itself does not
      belong as a safeguard to mmapping the bo, so remove the check entirely.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      068c6ff1
  13. 30 1月, 2012 2 次提交
  14. 26 1月, 2012 1 次提交