1. 10 7月, 2013 3 次提交
    • C
      Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs" · 46a0b638
      Chris Wilson 提交于
      This reverts commit 25ff1195 and the follow on for Valleyview commit 2dc8aae0.
      
      commit 25ff1195
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Apr 4 21:31:03 2013 +0100
      
          drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
      
      commit 2dc8aae0
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed May 22 17:08:06 2013 +0100
      
          drm/i915: Workaround incoherence with fence updates on Valleyview
      
      Jon Bloomfield came up with a plausible explanation and cheap fix
      (drm/i915: Fix incoherence with fence updates on Sandybridge+) for the
      race condition, so lets run with it.
      
      This is a candidate for stable as the old workaround incurs a
      significant cost (calling wbinvd on all CPUs before performing the
      register write) for some workloads as noted by Carsten Emde.
      
      Link: http://lists.freedesktop.org/archives/intel-gfx/2013-June/028819.html
      References: https://www.osadl.org/?id=1543#c7602
      References: https://bugs.freedesktop.org/show_bug.cgi?id=63825Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Carsten Emde <C.Emde@osadl.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      46a0b638
    • C
      drm/i915: Fix incoherence with fence updates on Sandybridge+ · d18b9619
      Chris Wilson 提交于
      This hopefully fixes the root cause behind the workaround added in
      
      commit 25ff1195
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Thu Apr 4 21:31:03 2013 +0100
      
          drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
      
      Thanks to further investigation by Jon Bloomfield, he realised that
      the 64-bit register might be broken up by the hardware into two 32-bit
      writes (a problem we have encountered elsewhere). This non-atomicity
      would then cause an issue where a second thread would see an
      intermediate register state (new high dword, old low dword), and this
      register would randomly be used in preference to its own thread register.
      This would cause the second thread to read from and write into a fairly
      random tiled location.  Breaking the operation into 3 explicit 32-bit
      updates (first disable the fence, poke the upper bits, then poke the lower
      bits and enable) ensures that, given proper serialisation between the
      32-bit register write and the memory transfer, that the fence value is
      always consistent.
      
      Armed with this knowledge, we can explain how the previous workaround
      work. The key to the corruption is that a second thread sees an
      erroneous fence register that conflicts and overrides its own. By
      serialising the fence update across all CPUs, we have a small window
      where no GTT access is occurring and so hide the potential corruption.
      This also leads to the conclusion that the earlier workaround was
      incomplete.
      
      v2: Be overly paranoid about the order in which fence updates become
      visible to the GPU to make really sure that we turn the fence off before
      doing the update, and then only switch the fence on afterwards.
      Signed-off-by: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Carsten Emde <C.Emde@osadl.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d18b9619
    • C
      drm/i915: Fix write-read race with multiple rings · 02978ff5
      Chris Wilson 提交于
      Daniel noticed a problem where is we wrote to an object with ring A in
      the middle of a very long running batch, then executed a quick batch on
      ring B before a batch that reads from the same object, its obj->ring would
      now point to ring B, but its last_write_seqno would be still relative to
      ring A. This would allow for the user to read from the object before the
      GPU had completed the write, as set_domain would only check that ring B
      had passed the last_write_seqno.
      
      To fix this simply (and inelegantly), we bump the last_write_seqno when
      switching rings so that the last_write_seqno is always relative to the
      current obj->ring.
      
      This fixes igt/tests/gem_write_read_ring_switch.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      [danvet: Add note about the newly created igt which exercises this
      bug.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      02978ff5
  2. 09 7月, 2013 1 次提交
  3. 01 7月, 2013 4 次提交
    • C
      drm/i915: Refactor the wait_rendering completion into a common routine · d26e3af8
      Chris Wilson 提交于
      Harmonise the completion logic between the non-blocking and normal
      wait_rendering paths, and move that logic into a common function.
      
      In the process, we note that the last_write_seqno is by definition the
      earlier of the two read/write seqnos and so all successful waits will
      have passed the last_write_seqno. Therefore we can unconditionally clear
      the write seqno and its domains in the completion logic.
      
      v2: Add the missing ring parameter, because sometimes it is good to have
      things compile.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d26e3af8
    • C
      drm/i915: Only clear write-domains after a successful wait-seqno · daa13e1c
      Chris Wilson 提交于
      In the introduction of the non-blocking wait, I cut'n'pasted the wait
      completion code from normal locked path. Unfortunately, this neglected
      that the normal path returned early if the wait returned early. The
      result is that read-only waits may return whilst the GPU is still
      writing to the bo.
      
      Fixes regression from
      commit 3236f57a [v3.7]
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Fri Aug 24 09:35:09 2012 +0100
      
          drm/i915: Use a non-blocking wait for set-to-domain ioctl
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66163
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      daa13e1c
    • J
      drm/i915: fix build warning on format specifier mismatch · 3765f304
      Jani Nikula 提交于
      drivers/gpu/drm/i915/i915_gem.c: In function ‘i915_gem_object_bind_to_gtt’:
      drivers/gpu/drm/i915/i915_gem.c:3002:3: warning: format ‘%ld’ expects
      argument of type ‘long int’, but argument 5 has type ‘size_t’ [-Wformat]
      
      v2: Use %zu instead of %d. Two char patch, and 100% wrong. (Ville)
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      3765f304
    • K
      drm/i915: make compact dma scatter lists creation work with SWIOTLB backend. · 1625e7e5
      Konrad Rzeszutek Wilk 提交于
      Git commit 90797e6d
      ("drm/i915: create compact dma scatter lists for gem objects") makes
      certain assumptions about the under laying DMA API that are not always
      correct.
      
      On a ThinkPad X230 with an Intel HD 4000 with Xen during the bootup
      I see:
      
      [drm:intel_pipe_set_base] *ERROR* pin & fence failed
      [drm:intel_crtc_set_config] *ERROR* failed to set mode on [CRTC:3], err = -28
      
      Bit of debugging traced it down to dma_map_sg failing (in
      i915_gem_gtt_prepare_object) as some of the SG entries were huge (3MB).
      
      That unfortunately are sizes that the SWIOTLB is incapable of handling -
      the maximum it can handle is a an entry of 512KB of virtual contiguous
      memory for its bounce buffer. (See IO_TLB_SEGSIZE).
      
      Previous to the above mention git commit the SG entries were of 4KB, and
      the code introduced by above git commit squashed the CPU contiguous PFNs
      in one big virtual address provided to DMA API.
      
      This patch is a simple semi-revert - were we emulate the old behavior
      if we detect that SWIOTLB is online. If it is not online then we continue
      on with the new compact scatter gather mechanism.
      
      An alternative solution would be for the the '.get_pages' and the
      i915_gem_gtt_prepare_object to retry with smaller max gap of the
      amount of PFNs that can be combined together - but with this issue
      discovered during rc7 that might be too risky.
      Reported-and-Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      CC: Chris Wilson <chris@chris-wilson.co.uk>
      CC: Imre Deak <imre.deak@intel.com>
      CC: Daniel Vetter <daniel.vetter@ffwll.ch>
      CC: David Airlie <airlied@linux.ie>
      CC: <dri-devel@lists.freedesktop.org>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      1625e7e5
  4. 13 6月, 2013 3 次提交
  5. 03 6月, 2013 4 次提交
  6. 01 6月, 2013 2 次提交
  7. 23 5月, 2013 2 次提交
  8. 22 5月, 2013 1 次提交
  9. 06 5月, 2013 1 次提交
  10. 30 4月, 2013 1 次提交
  11. 18 4月, 2013 4 次提交
  12. 09 4月, 2013 1 次提交
  13. 28 3月, 2013 1 次提交
  14. 27 3月, 2013 1 次提交
  15. 23 3月, 2013 2 次提交
  16. 19 3月, 2013 1 次提交
  17. 04 3月, 2013 1 次提交
  18. 23 2月, 2013 1 次提交
  19. 20 2月, 2013 2 次提交
  20. 15 2月, 2013 1 次提交
    • B
      drm/i915: Extract ring init from hw_init · 4fc7c971
      Ben Widawsky 提交于
      The ring initialization will differ a bit in upcoming generations, and
      this split will prepare the code for what's needed.
      
      This patch also fixes a bug introduced in:
      commit 99433931
      Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Date:   Tue Jan 22 14:12:17 2013 +0200
      
          drm/i915: use gem_set_seqno() on hardware init
      
      After doing the extraction, the bad error handling became obvious.  I
      acknowledge that this should be two patches, but it's a pretty
      small/trivial patch. If requested, I can certainly do the fix as a
      distinct patch.
      
      v2: Should be cleanup blt, not init blt on failure (Chris)
      
      v3: Forgot to git add on v2
      
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      4fc7c971
  21. 31 1月, 2013 1 次提交
  22. 22 1月, 2013 2 次提交
    • M
      drm/i915: use gem_set_seqno() on hardware init · 99433931
      Mika Kuoppala 提交于
      When machine was rebooted or module was reloaded,
      gem_hw_init() set last_seqno to be identical to next_seqno.
      This lead to situation that waits for first ever request
      always passed immediately regardless if it was actually
      executed.
      
      Use gem_set_seqno() to be consistent how hw is
      initialized on init, wrap and on resume.
      Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      99433931
    • D
      drm/i915: create a race-free reset detection · f69061be
      Daniel Vetter 提交于
      With the previous patch the state transition handling of the reset
      code itself is now (hopefully) race free and solid. But that still
      leaves out everyone else - with the various lock-free wait paths
      we have there's the possibility that the reset happens between the
      point where we read the seqno we should wait on and the actual wait.
      
      And if __wait_seqno then never sees the RESET_IN_PROGRESS state, we'll
      happily wait for a seqno which will in all likelyhood never signal.
      
      In practice this is not a big problem since the X server gets
      constantly interrupted, and can then submit more work (hopefully) to
      unblock everyone else: As soon as a new seqno write lands, all waiters
      will unblock. But running the i-g-t reset testcase ZZ_hangman can
      expose this race, especially on slower hw with fewer cpu cores.
      
      Now looking forward to ARB_robustness and friends that's not the best
      possible behaviour, hence this patch adds a reset_counter to be able
      to detect any reset, even if a given thread never observed the
      in-progress state.
      
      The important part is to correctly order things:
      - The write side needs to increment the counter after any seqno gets
        reset.  Hence we need to do that at the end of the reset work, and
        again wake everyone up. We also need to place a barrier in between
        any possible seqno changes and the counter increment, since any
        unlock operations only guarantee that nothing leaks out, but not
        that at later load operation gets moved ahead.
      - On the read side we need to ensure that no reset can sneak in and
        invalidate the seqno. In all cases we can use the one-sided barrier
        that unlock operations guarantee (of the lock protecting the
        respective seqno/ring pair) to ensure correct ordering. Hence it is
        sufficient to place the atomic read before the mutex/spin_unlock and
        no additional barriers are required.
      
      The end-result of all this is that we need to wake up everyone twice
      in a reset operation:
      - First, before the reset starts, to get any lockholders of the locks,
        so that the reset can proceed.
      - Second, after the reset is completed, to allow waiters to properly
        and reliably detect the reset condition and bail out.
      
      I admit that this entire reset_counter thing smells a bit like
      overkill, but I think it's justified since it makes it really explicit
      what the bail-out condition is. And we need a reset counter anyway to
      implement ARB_robustness, and imo with finer-grained locking on the
      horizont this is the most resilient scheme I could think of.
      
      v2: Drop spurious change in the wait_for_error EXIT_COND - we only
      need to wait until we leave the reset-in-progress wedged state.
      
      v3: Don't play tricks with barriers in the throttle ioctl, the
      spin_unlock is barrier enough.
      
      I've also considered using a little helper to grab the current
      reset_counter, but then decided that hiding the atomic_read isn't a
      great idea, since having it explicitly show up in the code is a nice
      remainder to reviews to check the memory barriers.
      
      v4: Add a comment to explain why we need to fall through in
      __wait_seqno in the end variable assignments.
      
      v5: Review from Damien:
      - s/smb/smp/ in a comment
      - don't increment the reset counter after we've set it to WEDGED. Now
        we (again) properly wedge the gpu when the reset fails.
      Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f69061be