1. 22 1月, 2013 1 次提交
    • D
      drm/i915: create a race-free reset detection · f69061be
      Daniel Vetter 提交于
      With the previous patch the state transition handling of the reset
      code itself is now (hopefully) race free and solid. But that still
      leaves out everyone else - with the various lock-free wait paths
      we have there's the possibility that the reset happens between the
      point where we read the seqno we should wait on and the actual wait.
      
      And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll
      happily wait for a seqno which will in all likelyhood never signal.
      
      In practice this is not a big problem since the X server gets
      constantly interrupted, and can then submit more work (hopefully) to
      unblock everyone else: As soon as a new seqno write lands, all waiters
      will unblock. But running the i-g-t reset testcase ZZ_hangman can
      expose this race, especially on slower hw with fewer cpu cores.
      
      Now looking forward to ARB_robustness and friends that's not the best
      possible behaviour, hence this patch adds a reset_counter to be able
      to detect any reset, even if a given thread never observed the
      in-progress state.
      
      The important part is to correctly order things:
      - The write side needs to increment the counter after any seqno gets
        reset.  Hence we need to do that at the end of the reset work, and
        again wake everyone up. We also need to place a barrier in between
        any possible seqno changes and the counter increment, since any
        unlock operations only guarantee that nothing leaks out, but not
        that at later load operation gets moved ahead.
      - On the read side we need to ensure that no reset can sneak in and
        invalidate the seqno. In all cases we can use the one-sided barrier
        that unlock operations guarantee (of the lock protecting the
        respective seqno/ring pair) to ensure correct ordering. Hence it is
        sufficient to place the atomic read before the mutex/spin_unlock and
        no additional barriers are required.
      
      The end-result of all this is that we need to wake up everyone twice
      in a reset operation:
      - First, before the reset starts, to get any lockholders of the locks,
        so that the reset can proceed.
      - Second, after the reset is completed, to allow waiters to properly
        and reliably detect the reset condition and bail out.
      
      I admit that this entire reset_counter thing smells a bit like
      overkill, but I think it's justified since it makes it really explicit
      what the bail-out condition is. And we need a reset counter anyway to
      implement ARB_robustness, and imo with finer-grained locking on the
      horizont this is the most resilient scheme I could think of.
      
      v2: Drop spurious change in the wait_for_error EXIT_COND - we only
      need to wait until we leave the reset-in-progress wedged state.
      
      v3: Don't play tricks with barriers in the throttle ioctl, the
      spin_unlock is barrier enough.
      
      I've also considered using a little helper to grab the current
      reset_counter, but then decided that hiding the atomic_read isn't a
      great idea, since having it explicitly show up in the code is a nice
      remainder to reviews to check the memory barriers.
      
      v4: Add a comment to explain why we need to fall through in
      __wait_seqno in the end variable assignments.
      
      v5: Review from Damien:
      - s/smb/smp/ in a comment
      - don't increment the reset counter after we've set it to WEDGED. Now
        we (again) properly wedge the gpu when the reset fails.
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f69061be
  2. 20 1月, 2013 7 次提交
    • C
      drm/i915: Only apply the mb() when flushing the GTT domain during a finish · 97c809fd
      Chris Wilson 提交于
      Now that we seem to have brought order to the GTT barriers, the last one
      to review is the terminal barrier before we unbind the buffer from the
      GTT. This needs to only be performed if the buffer still resides in the
      GTT domain, and so we can skip some needless barriers otherwise.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      97c809fd
    • C
      drm/i915: Only insert the mb() before updating the fence parameter · d0a57789
      Chris Wilson 提交于
      With a fence, we only need to insert a memory barrier around the actual
      fence alteration for CPU accesses through the GTT. Performing the
      barrier in flush-fence was inserting unnecessary and expensive barriers
      for never fenced objects.
      
      Note removing the barriers from flush-fence, which was effectively a
      barrier before every direct access through the GTT, revealed that we
      where missing a barrier before the first access through the GTT. Lack of
      that barrier was sufficient to cause GPU hangs.
      
      v2: Add a couple more comments to explain the new barriers
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d0a57789
    • D
      drm/i915: clear up wedged transitions · 1f83fee0
      Daniel Vetter 提交于
      We have two important transitions of the wedged state in the current
      code:
      
      - 0 -> 1: This means a hang has been detected, and signals to everyone
        that they please get of any locks, so that the reset work item can
        do its job.
      
      - 1 -> 0: The reset handler has completed.
      
      Now the last transition mixes up two states: "Reset completed and
      successful" and "Reset failed". To distinguish these two we do some
      tricks with the reset completion, but I simply could not convince
      myself that this doesn't race under odd circumstances.
      
      Hence split this up, and add a new terminal state indicating that the
      hw is gone for good.
      
      Also add explicit #defines for both states, update comments.
      
      v2: Split out the reset handling bugfix for the throttle ioctl.
      
      v3: s/tmp/wedged/ sugested by Chris Wilson. Also fixup up a rebase
      error which prevented this patch from actually compiling.
      
      v4: To unify the wedged state with the reset counter, keep the
      reset-in-progress state just as a flag. The terminally-wedged state is
      now denoted with a big number.
      
      v5: Add a comment to the reset_counter special values explaining that
      WEDGED & RESET_IN_PROGRESS needs to be true for the code to be
      correct.
      
      v6: Fixup logic errors introduced with the wedged+reset_counter
      unification. Since WEDGED implies reset-in-progress (in a way we're
      terminally stuck in the dead-but-reset-not-completed state), we need
      ensure that we check for this everywhere. The specific bug was in
      wait_for_error, which would simply have timed out.
      
      v7: Extract an inline i915_reset_in_progress helper to make the code
      more readable. Also annote the reset-in-progress case with an
      unlikely, to help the compiler optimize the fastpath. Do the same for
      the terminally wedged case with i915_terminally_wedged.
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      1f83fee0
    • D
      drm/i915: fix reset handling in the throttle ioctl · 308887aa
      Daniel Vetter 提交于
      While auditing the code I've noticed one place (the throttle ioctl)
      which does not yet wait for the reset handler to complete and doesn't
      properly decode the wedge state into -EAGAIN/-EIO. Fix this up by
      calling the right helpers. This might explain the oddball "my
      compositor just died in a successfull gpu reset" reports. Or maybe not, since
      current mesa doesn't use this ioctl to throttle command submission.
      
      The throttle ioctl doesn't take the struct_mutex, so to avoid busy-looping
      with -EAGAIN while a reset is in process, check for errors first and wait
      for the handler to complete if a reset is pending by calling
      i915_gem_wait_for_error.
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      308887aa
    • D
      drm/i915: move wedged to the other gpu error handling stuff · 33196ded
      Daniel Vetter 提交于
      And to make Ben Widawsky happier, use the gpu_error instead of
      the entire device as the argument in some functions.
      
      Drop the outdated comment on ->wedged for now, a follow-up patch will
      change the semantics and add a proper comment again.
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      33196ded
    • D
      drm/i915: extract hangcheck/reset/error_state state into substruct · 99584db3
      Daniel Vetter 提交于
      This has been sprinkled all over the place in dev_priv. I think
      it'd be good to also move all the code into a separate file like
      i915_gem_error.c, but that's for another patch.
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      99584db3
    • B
      drm/i915: Remove use of gtt_mappable_entries · 93d18799
      Ben Widawsky 提交于
      Mappable_end, ie. size is almost always what you want as opposed to the
      number of entries. Since we already have that information, we can scrap
      the number of entries and only calculate it when needed.
      
      If gtt_start is !0, this will have slightly different behavior. This
      difference can only occur in DRI1, and exists when we try to kick out
      the firmware fb. The new code seems like a bugfix to me.
      
      The other case where we've changed the behavior is during init we check
      the mappable region against our current known upper and lower limits
      (64MB, and 512MB). This now matches the comment, and makes things more
      convenient after removing gtt_mappable_entries.
      
      Also worth noting is the setting of mappable_end is taken out of setup
      because we do it earlier now in the DRI2 case and therefore need to add
      that tiny hunk to support the DRI1 IOCTL.
      
      v2: Move up mappable end to before legacy AGP init
      
      v3: Add the dev_priv inclusion here from previous rebase error in patch
      5
      
      Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> (v2)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: squash in fix for a printk format flag mismatch warning.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      93d18799
  3. 18 1月, 2013 6 次提交
  4. 20 12月, 2012 3 次提交
    • B
      drm/i915: Move even more gtt code to i915_gem_gtt · d7e5008f
      Ben Widawsky 提交于
      This really should have been part of the kill agp series.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d7e5008f
    • D
      drm/i915: disable shrinker lock stealing for create_mmap_offset · da494d7c
      Daniel Vetter 提交于
      The mmap offset structure is not part of the drm/i915 code, but
      provided by gem helpers. To avoid leaky abstractions (by either
      depending upon implementation details of said helper wrt to
      preallocations, or reimplementing it in our code and so fuzzing
      around in internal details of that helpr) simply disable
      the shrinker lock stealing accross calls into the helper functions.
      
      This should fix igt/gem_tiled_swapping.
      
      v2: Fix cleanup path confusion bemoaned by Chris Wilson.
      Reported-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      da494d7c
    • D
      drm/i915: optionally disable shrinker lock stealing · 677feac2
      Daniel Vetter 提交于
      commit 5774506f
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed Nov 21 13:04:04 2012 +0000
      
          drm/i915: Borrow our struct_mutex for the direct reclaim
      
      added a nice trick to steal the struct_mutex lock in the shrinker if
      it's the current task holding it. But this also caused the requirement
      that every place which allocates memory needs to be careful about the
      gem state of objects, since the shrinker could have pulled the rug out
      from under it. We've usually solved this by carefully preallocating
      things or ensure that buffers are pinned already.
      
      But the shrinker also reaps mmap offset, so allocating those needs to
      be careful, too. Now that code has been factored out into some common
      helpers, so either we have fragile code depending upon the common
      helper not doing something we don't want it to do. Or we need to
      reimplement the mmap offset creation and so also leak implementation
      details into our code.
      
      Since this all results in leaky abstraction, cop out by disabling the
      lock borrowing trick while calling down into the helpers. That way our
      craziness is nicely confined to files in drm/i915.
      
      v2: Split out the change to create_mmap_offset as request by Chris Wilson.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      677feac2
  5. 19 12月, 2012 6 次提交
  6. 17 12月, 2012 1 次提交
  7. 11 12月, 2012 4 次提交
  8. 07 12月, 2012 1 次提交
  9. 06 12月, 2012 1 次提交
  10. 04 12月, 2012 3 次提交
  11. 01 12月, 2012 2 次提交
  12. 29 11月, 2012 5 次提交
    • D
      drm/i915: optimize the shmem_pwrite slowpath handling · 8dcf015e
      Daniel Vetter 提交于
      Since we drop dev->struct_mutex when going through the slowpath, the
      object might have been moved out of the cpu domain. Hence we need to
      clflush the entire object to ensure that after the ioctl returns,
      everything is coherent again (interwoven writes are ill-defined
      anyway).
      
      But we only need to do this if we start in the cpu domain and the
      object requires flushing for coherency. So don't do the flushing if
      the object is coherent anyway or if we've done in-line clfushing
      already.
      
      v2: i915_gem_clflush_object already checks whether the object is
      coherent and if so, drops the flushing. Hence we don't need to check
      that ourselves, simplifying the condition.
      
      v3: Reorder the checks for better clarity (and adjust the comment
      accordingly), suggested by Chris Wilson.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8dcf015e
    • D
      drm/i915: simplify shmem pwrite/pread slowpath handling · a39a6805
      Daniel Vetter 提交于
      The shmem paths for pwrite/pread used a clever trick to hold onto a
      single page when dropping the big dev->struct_mutex for the slowpath.
      But this ran the risk of reinstating (or not completely purging) the
      backing storage when dropping purgeable objects.
      
      Hence the code needed to keep track of whether it ever dropped the
      lock, and if it did, manually check whether it needs to re-purge the
      backing storage. But thanks to the pages pin count introduced in
      
      commit a5570178
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Tue Sep 4 21:02:54 2012 +0100
      
          drm/i915: Pin backing pages whilst exporting through a dmabuf vmap
      
      which allowed us to pin the backing storage and remove that page
      reference trick from shmem_pwrite/read in
      
      commit f60d7f0c
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Tue Sep 4 21:02:56 2012 +0100
      
          drm/i915: Pin backing pages for pread
      
      and
      
      commit 755d2218
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Tue Sep 4 21:02:55 2012 +0100
      
          drm/i915: Pin backing pages for pwrite
      
      we can now abolish this check. The slowpath cleanup completely
      disappears from pread, and for pwrite we're only left with the domain
      fixup in case someone moved the object out of the cpu domain from
      under us. A follow-on patch will optimize that a notch more.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a39a6805
    • M
      drm/i915: Set sync_seqno properly after seqno wrap · 7b01e260
      Mika Kuoppala 提交于
      i915_gem_handle_seqno_wrap() will zero all sync_seqnos but as the
      wrap can happen inside ring->sync_to(), pre wrap seqno was
      carried over and overwrote the zeroed sync_seqno.
      
      When wrap is handled, all outstanding requests will be retired and
      objects moved to inactive queue, causing their last_read_seqno to be zero.
      Use this to update the sync_seqno correctly.
      
      RING_SYNC registers after wrap will contain pre wrap values which
      are >= seqno. So injecting the semaphore wait into ring completes
      immediately.
      
      Original idea for using last_read_seqno from Chris Wilson.
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      7b01e260
    • C
      drm/i915: Rearrange code to only have a single method for waiting upon the ring · 3e960501
      Chris Wilson 提交于
      Replace the wait for the ring to be clear with the more common wait for
      the ring to be idle. The principle advantage is one less exported
      intel_ring_wait function, and the removal of a hardcoded value.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3e960501
    • C
      drm/i915: Simplify flushing activity on the ring · b662a066
      Chris Wilson 提交于
      As we now always preallocate the seqno before writing to the ring, we
      can trivially test if we have any pending activity on the ring by
      inspecting the olr. This makes it then possible to flush operations that
      are not normally associated with a request, like power-management.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b662a066