1. 26 7月, 2012 5 次提交
  2. 20 7月, 2012 1 次提交
    • C
      drm/i915: Add comments to explain the BSD tail write workaround · 12f55818
      Chris Wilson 提交于
      Having had to dive into the bspec to understand what each stage of the
      workaround meant, and how that the ring broadcasting IDLE corresponded
      with the GT powering down the ring (i.e. rc6) add comments to aide
      the next reader.
      
      And since the register "is used to control all aspects of PSMI and power
      saving functions" that makes it quite interesting to inspect with
      regards to RC6 hangs, so add it to the error-state.
      
      v2: Rediscover the piece of magic, set the RNCID to 0 before waiting for
      the ring to wake up.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      12f55818
  3. 05 7月, 2012 5 次提交
    • D
      drm/i915: introduce for_each_encoder_on_crtc · 6c2b7c12
      Daniel Vetter 提交于
      We already have this pattern at quite a few places, and moving part of
      the modeset helper stuff into the driver will add more.
      
      v2: Don't clobber the crtc struct name with the macro parameter ...
      
      v3: Convert two more places noticed by Paulo Zanoni.
      Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6c2b7c12
    • D
      drm/i915: non-interruptible sleeps can't handle -EAGAIN · d6b2c790
      Daniel Vetter 提交于
      So don't return -EAGAIN, even in the case of a gpu hang. Remap it to
      -EIO instead. Note that this isn't really an issue with
      interruptability, but more that we have quite a few codepaths (mostly
      around kms stuff) that simply can't handle any errors and hence not
      even -EAGAIN. Instead of adding proper failure paths so that we could
      restart these ioctls we've opted for the cheap way out of sleeping
      non-interruptibly.  Which works everywhere but when the gpu dies,
      which this patch fixes.
      
      So essentially interruptible == false means 'wait for the gpu or die
      trying'.'
      
      This patch is a bit ugly because intel_ring_begin is all non-interruptible
      and hence only returns -EIO. But as the comment in there says,
      auditing all the callsites would be a pain.
      
      To avoid duplicating code, reuse i915_gem_check_wedge in __wait_seqno
      and intel_wait_ring_buffer. Also use the opportunity to clarify the
      different cases in i915_gem_check_wedge a bit with comments.
      
      v2: Don't access dev_priv->mm.interruptible from check_wedge - we
      might not hold dev->struct_mutex, making this racy. Instead pass
      interruptible in as a parameter. I've noticed this because I've hit a
      BUG_ON(!mutex_is_locked) at the top of check_wedge. This has been
      added in
      
      commit b4aca010
      Author: Ben Widawsky <ben@bwidawsk.net>
      Date:   Wed Apr 25 20:50:12 2012 -0700
      
          drm/i915: extract some common olr+wedge code
      
      although that commit is missing any justification for this. I guess
      it's just copy&paste, because the same commit add the same BUG_ON
      check to check_olr, where it indeed makes sense.
      
      But in check_wedge everything we access is protected by other means,
      so this is superflous. And because it now gets in the way (we add a
      new caller in __wait_seqno, which can be called without
      dev->struct_mutext) let's just remove it.
      
      v3: Group all the i915_gem_check_wedge refactoring into this patch, so
      that this patch here is all about not returning -EAGAIN to callsites
      that can't handle syscall restarting.
      
      v4: Add clarification what interuptible == fales means in our code,
      requested by Ben Widawsky.
      
      v5: Fix EAGAIN mispell noticed by Chris Wilson.
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Tested-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d6b2c790
    • P
      drm/i915: get rid of dev_priv->info->has_pch_split · 45e6e3a1
      Paulo Zanoni 提交于
      Previously we had has_pch_split to tell us whether we had a PCH or not
      and we also had dev_priv->pch_type to tell us which kind of PCH it
      was, but it could only be used if we were 100% sure we did have a PCH.
      Now that PCH_NONE was added to dev_priv->pch_type we don't need
      has_pch_split anymore: we can just check for pch_type != PCH_NONE.
      
      The HAS_PCH_{IBX,CPT,LPT} macros use dev_priv->pch_type, so they can
      only be called after intel_detect_pch. The HAS_PCH_SPLIT macro looks
      at dev_priv->info->has_pch_split, which is available earlier.
      
      Since the goal is to implement HAS_PCH_SPLIT using dev_priv->pch_type
      instead of dev_priv->info->has_pch_split, we need to make sure that
      intel_detect_pch is called before any calls to HAS_PCH_SPLIT are made.
      So we moved the intel_detect_pch call to an earlier stage.
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      45e6e3a1
    • P
      drm/i915: add PCH_NONE to enum intel_pch · f0350830
      Paulo Zanoni 提交于
      And rely on the fact that it's 0 to assume that machines without a PCH
      will have PCH_NONE as dev_priv->pch_type.
      
      Just today I finally realized that HAS_PCH_IBX is true for machines
      without a PCH. IMHO this is totally counter-intuitive and I don't
      think it's a good idea to assume that we're going to check for
      HAS_PCH_IBX only after we check for HAS_PCH_SPLIT.
      
      I believe that in the future we'll have more PCH types and checks
      like:
      
          if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
      
      will become more and more common. There's a good chance that we may
      break non-PCH machines by adding these checks in code that runs on all
      machines. I also believe that the HAS_PCH_SPLIT check will become less
      common as we add more and more different PCH types. We'll probably
      start replacing checks like:
      
          if (HAS_PCH_SPLIT(dev))
              foo();
          else
              bar();
      
      with:
      
          if (HAS_PCH_NEW(dev))
              baz();
          else if (HAS_PCH_OLD(dev) || HAS_PCH_IBX(dev))
              foo();
          else
              bar();
      
      and this may break gen 2/3/4.
      
      As far as we have investigated, this patch will affect the behavior of
      intel_hdmi_dpms and intel_dp_link_down on gen 4. In both functions the
      code inside the HAS_PCH_IBX check is for IBX-specific workarounds, so
      we should be safe. If we start bisecting gen 2/3/4 bugs to this commit
      we should consider replacing the HAS_PCH_IBX checks with something
      else.
      
      V2: Improve commit message, list possible side effects and solution.
      Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f0350830
    • D
      drm/i915: fix up ilk rc6 disabling confusion · 930ebb46
      Daniel Vetter 提交于
      While creating the new enable/disable_gt_powersave functions in
      
      commit 8090c6b9
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Sun Jun 24 16:42:32 2012 +0200
      
          drm/i915: wrap up gt powersave enabling functions
      
      I've botched up the handling of ironlake_disable_rc6. Fix this up by
      calling it at the right place. Note though that ironlake_disable_rc6
      does a bit more than just disabling rc6 - it also tears down all the
      allocated context objects.
      
      Hence we need to move intel_teardown_rc6 out and directly call it from
      intel_modeset_cleanup.
      
      Also properly mark ironlake_enable_rc6 as static and kill the un-used
      declaration in i915_drv.h.
      
      Note: In review a question popped out why disable_rc6 also tears down
      the backing object and why we should move that out - it's simply for
      consistency with gen6+ rps code, which does it that way.
      
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      930ebb46
  4. 04 7月, 2012 1 次提交
  5. 21 6月, 2012 1 次提交
  6. 20 6月, 2012 1 次提交
  7. 14 6月, 2012 6 次提交
    • B
      drm/i915: reset the GPU on context fini · 8e96d9c4
      Ben Widawsky 提交于
      It's the only way we know how to make the GPU actually forget about the
      default context.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      8e96d9c4
    • B
      drm/i915/context: create & destroy ioctls · 84624813
      Ben Widawsky 提交于
      Add the interfaces to allow user space to create and destroy contexts.
      Contexts are destroyed automatically if the file descriptor for the dri
      device is closed.
      
      Following convention as usual here causes checkpatch warnings.
      
      v2: with is_initialized, no longer need to init at create
      drop the context switch on create (daniel)
      
      v3: Use interruptible lock (Chris)
      return -ENODEV in !GEM case (Chris)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      84624813
    • B
      drm/i915: add ccid to error state · b9a3906b
      Ben Widawsky 提交于
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      b9a3906b
    • B
      drm/i915: context switch implementation · e0556841
      Ben Widawsky 提交于
      Implement the context switch code as well as the interfaces to do the
      context switch. This patch also doesn't match 1:1 with the RFC patches.
      The main difference is that from Daniel's responses the last context
      object is now stored instead of the last context. This aids in allows us
      to free the context data structure, and context object independently.
      
      There is room for optimization: this code will pin the context object
      until the next context is active. The optimal way to do it is to
      actually pin the object, move it to the active list, do the context
      switch, and then unpin it. This allows the eviction code to actually
      evict the context object if needed.
      
      The context switch code is missing workarounds, they will be implemented
      in future patches.
      
      v2: actually do obj->dirty=1 in switch (daniel)
      Modified comment around above
      Remove flags to context switch (daniel)
      Move mi_set_context code to i915_gem_context.c (daniel)
      Remove seqno , use lazy request instead (daniel)
      
      v3: use i915_gem_request_next_seqno instead of
            outstanding_lazy_request (Daniel)
      remove id's from trace events (Daniel)
      Put the context BO in the instruction domain (Daniel)
      Don't unref the BO is context switch fails (Chris)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      e0556841
    • B
      drm/i915: context basic create & destroy · 40521054
      Ben Widawsky 提交于
      Invent an abstraction for a hw context which is passed around through
      the core functions. The main bit a hw context holds is the buffer object
      which backs the context. The rest of the members are just helper
      functions. Specifically the ring member, which could likely go away if
      we decide to never implement whatever other hw context support exists.
      
      Of note here is the introduction of the 64k alignment constraint for the
      BO. If contexts become heavily used, we should consider tweaking this
      down to 4k. Until the contexts are merged and tested a bit though, I
      think 64k is a nice start (based on docs).
      
      Since we don't yet switch contexts, there is really not much complexity
      here. Creation/destruction works pretty much as one would expect. An idr
      is used to generate the context id numbers which are unique per file
      descriptor.
      
      v2: add DRM_DEBUG_DRIVERS to distinguish ENOMEM failures (ben)
      convert a BUG_ON to WARN_ON, default destruction is still fatal (ben)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      40521054
    • B
      drm/i915: preliminary context support · 254f965c
      Ben Widawsky 提交于
      Very basic code for context setup/destruction in the driver.
      
      Adds the file i915_gem_context.c This file implements HW context
      support. On gen5+ a HW context consists of an opaque GPU object which is
      referenced at times of context saves and restores.  With RC6 enabled,
      the context is also referenced as the GPU enters and exists from RC6
      (GPU has it's own internal power context, except on gen5).  Though
      something like a context does exist for the media ring, the code only
      supports contexts for the render ring.
      
      In software, there is a distinction between contexts created by the
      user, and the default HW context. The default HW context is used by GPU
      clients that do not request setup of their own hardware context. The
      default context's state is never restored to help prevent programming
      errors. This would happen if a client ran and piggy-backed off another
      clients GPU state.  The default context only exists to give the GPU some
      offset to load as the current to invoke a save of the context we
      actually care about. In fact, the code could likely be constructed,
      albeit in a more complicated fashion, to never use the default context,
      though that limits the driver's ability to swap out, and/or destroy
      other contexts.
      
      All other contexts are created as a request by the GPU client. These
      contexts store GPU state, and thus allow GPU clients to not re-emit
      state (and potentially query certain state) at any time. The kernel
      driver makes certain that the appropriate commands are inserted.
      
      There are 4 entry points into the contexts, init, fini, open, close.
      The names are self-explanatory except that init can be called during
      reset, and also during pm thaw/resume. As we expect our context to be
      preserved across these events, we do not reinitialize in this case.
      
      As Adam Jackson pointed out, The cutoff of 1MB where a HW context is
      considered too big is arbitrary. The reason for this is even though
      context sizes are increasing with every generation, they have yet to
      eclipse even 32k. If we somehow read back way more than that, it
      probably means BIOS has done something strange, or we're running on a
      platform that wasn't designed for this.
      
      v2: rename load/unload to init/fini (daniel)
      remove ILK support for get_size() (indirectly daniel)
      add HAS_HW_CONTEXTS macro to clarify supported platforms (daniel)
      added comments (Ben)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      254f965c
  8. 13 6月, 2012 1 次提交
  9. 05 6月, 2012 1 次提交
    • D
      drm/i915: hold forcewake around ring hw init · b7884eb4
      Daniel Vetter 提交于
      Empirical evidence suggests that we need to: On at least one ivb
      machine when running the hangman i-g-t test, the rings don't properly
      initialize properly - the RING_START registers seems to be stuck at
      all zeros.
      
      Holding forcewake around this register init sequences makes chip reset
      reliable again. Note that this is not the first such issue:
      
      commit f01db988
      Author: Sean Paul <seanpaul@chromium.org>
      Date:   Fri Mar 16 12:43:22 2012 -0400
      
          drm/i915: Add wait_for in init_ring_common
      
      added delay loops to make RING_START and RING_CTL initialization
      reliable on the blt ring at boot-up. So I guess it won't hurt if we do
      this unconditionally for all force_wake needing gpus.
      
      To avoid copy&pasting of the HAS_FORCE_WAKE check I've added a new
      intel_info bit for that.
      
      v2: Fixup missing commas in static struct and properly handling the
      error case in init_ring_common, both noticed by Jani Nikula.
      
      Cc: stable@vger.kernel.org
      Reported-and-tested-by: NYang Guang <guang.a.yang@intel.com>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50522Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b7884eb4
  10. 31 5月, 2012 3 次提交
    • D
      i915: add dma-buf vmap support for exporting vmapped buffer · 9a70cc2a
      Dave Airlie 提交于
      This is used to export a vmapping to the udl driver so that
      i915 and udl can share the udl scanout.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      9a70cc2a
    • B
      drm/i915: remap l3 on hw init · b9524a1e
      Ben Widawsky 提交于
      If any l3 rows have been previously remapped, we must remap them after
      GPU reset/resume too.
      
      v2: Just return (no warn) on remapping init if not IVB (Jesse)
      Move the check of schizo userspace to i915_gem_l3_remap (Jesse)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b9524a1e
    • B
      drm/i915: Dynamic Parity Detection handling · e3689190
      Ben Widawsky 提交于
      On IVB hardware we are given an interrupt whenever a L3 parity error
      occurs in the L3 cache. The L3 cache is used by internal GPU clients
      only.  This is a very rare occurrence (in fact to test this I need to
      use specially instrumented silicon).
      
      When a row in the L3 cache detects a parity error the HW generates an
      interrupt. The interrupt is masked in GTIMR until we get a chance to
      read some registers and alert userspace via a uevent. With this
      information userspace can use a sysfs interface (follow-up patch) to
      remap those rows.
      
      Way above my level of understanding, but if a given row fails, it is
      statistically more likely to fail again than a row which has not failed.
      Therefore it is desirable for an operating system to maintain a lifelong
      list of failing rows and always remap any bad rows on driver load.
      Hardware limits the number of rows that are remappable per bank/subbank,
      and should more than that many rows detect parity errors, software
      should maintain a list of the most frequent errors, and remap those
      rows.
      
      V2: Drop WARN_ON(IS_GEN6) (Jesse)
      DRM_DEBUG row/bank/subbank on errror (Jesse)
      Comment updates (Jesse)
      Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e3689190
  11. 25 5月, 2012 2 次提交
    • B
      drm/i915: s/i915_wait_request/i915_wait_seqno/g · 199b2bc2
      Ben Widawsky 提交于
      Wait request is poorly named IMO. After working with these functions for
      some time, I feel it's much clearer to name the functions more
      appropriately.
      
      Of course we must update the callers to use the new name as well.
      
      This leaves room within our namespace for a *real* wait request function
      at some point.
      
      Note to maintainer: this patch is optional.
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      199b2bc2
    • B
      drm/i915: wait render timeout ioctl · 23ba4fd0
      Ben Widawsky 提交于
      This helps implement GL_ARB_sync but stops short of allowing full blown
      sync objects. Finally we can use the new timed seqno waiting function
      to allow userspace to wait on a buffer object with a timeout. This
      implements that interface.
      
      The IOCTL will take as input a buffer object handle, and a timeout in
      nanoseconds (flags is currently optional but will likely be used for
      permutations of flush operations). Users may specify 0 nanoseconds to
      instantly check.
      
      The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
      non-zero timeout parameter the wait ioctl will wait for the given number
      of nanoseconds on an object becoming unbusy. Since the wait itself does
      so holding struct_mutex the object may become re-busied before this
      completes. A similar but shorter race condition exists in the busy
      ioctl.
      
      v2: ETIME/ERESTARTSYS instead of changing to EBUSY, and EGAIN (Chris)
      Flush the object from the gpu write domain (Chris + Daniel)
      Fix leaked refcount in good case (Chris)
      Naturally align ioctl struct (Chris)
      
      v3: Drop lock after getting seqno to avoid ugly dance (Chris)
      
      v4: check for 0 timeout after olr check to allow polling (Chris)
      
      v5: Updated the comment. (Chris)
      
      v6: Return -ETIME instead of -EBUSY when timeout_ns is 0 (Daniel)
      Fix the commit message comment to be less ugly (Ben)
      Add a warning to check the return timespec (Ben)
      
      v7: Use DRM_AUTH for the ioctl. (Eugeni)
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      23ba4fd0
  12. 23 5月, 2012 1 次提交
    • D
      i915: add dmabuf/prime buffer sharing support. · 1286ff73
      Daniel Vetter 提交于
      This adds handle->fd and fd->handle support to i915, this is to allow
      for offloading of rendering in one direction and outputs in the other.
      
      v2 from Daniel Vetter:
      - fixup conflicts with the prepare/finish gtt prep work.
      - implement ppgtt binding support.
      
      Note that we have squat i-g-t testcoverage for any of the lifetime and
      access rules dma_buf/prime support brings along. And there are quite a
      few intricate situations here.
      
      Also note that the integration with the existing code is a bit
      hackish, especially around get_gtt_pages and put_gtt_pages. It imo
      would be easier with the prep code from Chris Wilson's unbound series,
      but that is for 3.6.
      
      Also note that I didn't bother to put the new prepare/finish gtt hooks
      to good use by moving the dma_buf_map/unmap_attachment calls in there
      (like we've originally planned for).
      
      Last but not least this patch is only compile-tested, but I've changed
      very little compared to Dave Airlie's version. So there's a decent
      chance v2 on drm-next works as well as v1 on 3.4-rc.
      
      v3: Right when I've hit sent I've noticed that I've screwed up one
      obj->sg_list (for dmar support) and obj->sg_table (for prime support)
      disdinction. We should be able to merge these 2 paths, but that's
      material for another patch.
      
      v4: fix the error reporting bugs pointed out by ickle.
      
      v5: fix another error, and stop non-gtt mmaps on shared objects
      stop pread/pwrite on imported objects, add fake kmap
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      1286ff73
  13. 20 5月, 2012 2 次提交
    • C
      drm/i915: Introduce for_each_ring() macro · b4519513
      Chris Wilson 提交于
      In many places we wish to iterate over the rings associated with the
      GPU, so refactor them to use a common macro.
      
      Along the way, there are a few code removals that should be side-effect
      free and some rearrangement which should only have a cosmetic impact,
      such as error-state.
      
      Note that this slightly changes the semantics in the hangcheck code:
      We now always cycle through all enabled rings instead of
      short-circuiting the logic.
      
      v2: Pull in a couple of suggestions from Ben and Daniel for
      intel_ring_initialized() and not removing the warning (just moving them
      to a new home, closer to the error).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
      [danvet: Added note to commit message about the small behaviour
      change, suggested by Ben Widawsky.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b4519513
    • E
      drm/i915: program WM_LINETIME on Haswell · 1f8eeabf
      Eugeni Dodonov 提交于
      The line time can be programmed according to the number of horizontal
      pixels vs effective pixel rate ratio.
      
      v2: improve comment as per Chris Wilson suggestion
      
      v3: incorporate latest changes in specs.
      
      v4: move into wm update routine, also mention that the same routine can
      program IPS watermarks. We do not have their enablement code yet, nor
      handle the required clock settings at the moment, so this patch won't
      program those values for now.
      Signed-off-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      1f8eeabf
  14. 06 5月, 2012 3 次提交
    • D
      drm/i915: kill flags parameter for reset functions · d4b8bb2a
      Daniel Vetter 提交于
      Only half of them even cared, and it's always the same one.
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d4b8bb2a
    • D
      drm/i915: rework dev->first_error locking · 742cbee8
      Daniel Vetter 提交于
      - reduce the irq disabled section, even for a debugfs file this was
        way too long.
      - always disable irqs when taking the lock.
      
      v2: Thou shalt not mistake locking for reference counting, so:
      - reference count the error_state to protect from concurent freeeing.
        This will be only really used in the next patch.
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      742cbee8
    • D
      drm/i915: add interface to simulate gpu hangs · e5eb3d63
      Daniel Vetter 提交于
      gpu reset is a very important piece of our infrastructure.
      Unfortunately we only really it test by actually hanging the gpu,
      which often has bad side-effects for the entire system. And the gpu
      hang handling code is one of the rather complicated pieces of code we
      have, consisting of
      - hang detection
      - error capture
      - actual gpu reset
      - reset of all the gem bookkeeping
      - reinitialition of the entire gpu
      
      This patch adds a debugfs to selectively stopping rings by ceasing to
      update the hw tail pointer, which will result in the gpu no longer
      updating it's head pointer and eventually to the hangcheck firing.
      This way we can exercise the gpu hang code under controlled conditions
      without a dying gpu taking down the entire systems.
      
      Patch motivated by me forgetting to properly reinitialize ppgtt after
      a gpu reset.
      
      Usage:
      
      echo $((1 << $ringnum)) > i915_ring_stop # stops one ring
      
      echo 0xffffffff > i915_ring_stop # stops all, future-proof version
      
      then run whatever testload is desired. i915_ring_stop automatically
      resets after a gpu hang is detected to avoid hanging the gpu to fast
      and declaring it wedged.
      
      v2: Incorporate feedback from Chris Wilson.
      
      v3: Add the missing cleanup.
      
      v4: Fix up inconsistent size of ring_stop_read vs _write, noticed by
      Eugeni Dodonov.
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NEugeni Dodonov <eugeni.dodonov@intel.com>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e5eb3d63
  15. 03 5月, 2012 7 次提交