1. 14 7月, 2015 3 次提交
    • I
      drm/i915: remove unused has_dma_mapping flag · 5ec5b516
      Imre Deak 提交于
      After the previous patch this flag will check always clear, as it's
      never set for shmem backed and userptr objects, so we can remove it.
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Yeah this isn't really fixes but it's a nice cleanup to
      clarify the code but not really worth the hassle of backmerging. So
      just add to -fixes, we're still early in -rc.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5ec5b516
    • I
      drm/i915: avoid leaking DMA mappings · e2273302
      Imre Deak 提交于
      We have 3 types of DMA mappings for GEM objects:
      1. physically contiguous for stolen and for objects needing contiguous
         memory
      2. DMA-buf mappings imported via a DMA-buf attach operation
      3. SG DMA mappings for shmem backed and userptr objects
      
      For 1. and 2. the lifetime of the DMA mapping matches the lifetime of the
      corresponding backing pages and so in practice we create/release the
      mapping in the object's get_pages/put_pages callback.
      
      For 3. the lifetime of the mapping matches that of any existing GPU binding
      of the object, so we'll create the mapping when the object is bound to
      the first vma and release the mapping when the object is unbound from its
      last vma.
      
      Since the object can be bound to multiple vmas, we can end up creating a
      new DMA mapping in the 3. case even if the object already had one. This
      is not allowed by the DMA API and can lead to leaked mapping data and
      IOMMU memory space starvation in certain cases. For example HW IOMMU
      drivers (intel_iommu) allocate a new range from their memory space
      whenever a mapping is created, silently overriding a pre-existing
      mapping.
      
      Fix this by moving the creation/removal of DMA mappings to the object's
      get_pages/put_pages callbacks. These callbacks already check for and do
      an early return in case of any nested calls. This way objects of the 3.
      case also become more like the other object types.
      
      I noticed this issue by enabling DMA debugging, which got disabled after
      a while due to its internal mapping tables getting full. It also reported
      errors in connection to random other drivers that did a DMA mapping for
      an address that was previously mapped by i915 but was never released.
      Besides these diagnostic messages and the memory space starvation
      problem for IOMMUs, I'm not aware of this causing a real issue.
      
      The fix is based on a patch from Chris.
      
      v2:
      - move the DMA mapping create/remove calls to the get_pages/put_pages
        callbacks instead of adding new callbacks for these (Chris)
      v3:
      - also fix the get_page cache logic on the userptr async path (Chris)
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e2273302
    • T
      drm/i915: Snapshot seqno of most recently submitted request. · 94f7bbe1
      Tomas Elf 提交于
      The hang checker needs to inspect whether or not the ring request list is empty
      as well as if the given engine has reached or passed the most recently
      submitted request. The problem with this is that the hang checker cannot grab
      the struct_mutex, which is required in order to safely inspect requests since
      requests might be deallocated during inspection. In the past we've had kernel
      panics due to this very unsynchronized access in the hang checker.
      
      One solution to this problem is to not inspect the requests directly since
      we're only interested in the seqno of the most recently submitted request - not
      the request itself. Instead the seqno of the most recently submitted request is
      stored separately, which the hang checker then inspects, circumventing the
      issue of synchronization from the hang checker entirely.
      
      This fixes a regression introduced in
      
      commit 44cdd6d2
      Author: John Harrison <John.C.Harrison@Intel.com>
      Date:   Mon Nov 24 18:49:40 2014 +0000
      
          drm/i915: Convert 'ring_idle()' to use requests not seqnos
      
      v2 (Chris Wilson):
      - Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather
      than compute it over again.
      - Remove extra whitespace.
      
      Issue: VIZ-5998
      Signed-off-by: NTomas Elf <tomas.elf@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Add regressing commit citation provided by Chris.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      94f7bbe1
  2. 15 6月, 2015 3 次提交
  3. 01 6月, 2015 1 次提交
  4. 27 5月, 2015 1 次提交
    • C
      drm/i915: Use spinlocks for checking when to waitboost · 8d3afd7d
      Chris Wilson 提交于
      In commit 1854d5ca
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Tue Apr 7 16:20:32 2015 +0100
      
          drm/i915: Deminish contribution of wait-boosting from clients
      
      we removed an atomic timer based check for allowing waitboosting and
      moved it below the mutex taken during RPS. However, that mutex can be
      held for long periods of time on Vallyview/Cherryview as communication
      with the PCU is slow. As clients may frequently wait for results (e.g.
      such as tranform feedback) we introduced contention between the client
      and the RPS worker. We can take advantage of the RPS worker, by
      switching the wait boost decision to use spin locks and defer the
      actual reclocking to the worker.
      
      Fixes a regression of up to 45% on Baytrail and Baswell!
      
      v2 (Daniel):
      - Use max_freq_softlimit instead of the not-yet-merged boost
        frequency.
      - Don't inject a fake irq into the boost work, instead treat
        client_boost as just another legit waker.
      
      v3: Drop the now unused mask (Chris).
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112
      Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8d3afd7d
  5. 22 5月, 2015 3 次提交
  6. 21 5月, 2015 4 次提交
    • C
      drm/i915: Convert RPS tracking to a intel_rps_client struct · 2e1b8730
      Chris Wilson 提交于
      Now that we have internal clients, rather than faking a whole
      drm_i915_file_private just for tracking RPS boosts, create a new struct
      intel_rps_client and pass it along when waiting.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: s/rq/req/]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2e1b8730
    • C
      drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts · a6f766f3
      Chris Wilson 提交于
      Ring switches can occur many times per frame, and are often out of
      control, causing frequent RPS boosting for no practical benefit. Treat
      the sw semaphore synchronisation as a separate client and only allow it
      to boost once per busy/idle cycle.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: s/rq/req/]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a6f766f3
    • C
      drm/i915: Implement inter-engine read-read optimisations · b4716185
      Chris Wilson 提交于
      Currently, we only track the last request globally across all engines.
      This prevents us from issuing concurrent read requests on e.g. the RCS
      and BCS engines (or more likely the render and media engines). Without
      semaphores, we incur costly stalls as we synchronise between rings -
      greatly impacting the current performance of Broadwell versus Haswell in
      certain workloads (like video decode). With the introduction of
      reference counted requests, it is much easier to track the last request
      per ring, as well as the last global write request so that we can
      optimise inter-engine read read requests (as well as better optimise
      certain CPU waits).
      
      v2: Fix inverted readonly condition for nonblocking waits.
      v3: Handle non-continguous engine array after waits
      v4: Rebase, tidy, rewrite ring list debugging
      v5: Use obj->active as a bitfield, it looks cool
      v6: Micro-optimise, mostly involving moving code around
      v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
      v8: Rebase
      v9: Refactor i915_gem_object_sync() to allow the compiler to better
      optimise it.
      
      Benchmark: igt/gem_read_read_speed
      hsw:gt3e (with semaphores):
      Before: Time to read-read 1024k:		275.794µs
      After:  Time to read-read 1024k:		123.260µs
      
      hsw:gt3e (w/o semaphores):
      Before: Time to read-read 1024k:		230.433µs
      After:  Time to read-read 1024k:		124.593µs
      
      bdw-u (w/o semaphores):             Before          After
      Time to read-read 1x1:            26.274µs       10.350µs
      Time to read-read 128x128:        40.097µs       21.366µs
      Time to read-read 256x256:        77.087µs       42.608µs
      Time to read-read 512x512:       281.999µs      181.155µs
      Time to read-read 1024x1024:    1196.141µs     1118.223µs
      Time to read-read 2048x2048:    5639.072µs     5225.837µs
      Time to read-read 4096x4096:   22401.662µs    21137.067µs
      Time to read-read 8192x8192:   89617.735µs    85637.681µs
      
      Testcase: igt/gem_concurrent_blit (read-read and friends)
      Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
      [danvet: s/\<rq\>/req/g]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b4716185
    • D
      drm/i915: s/\<rq\>/req/g · eed29a5b
      Daniel Vetter 提交于
      The merged seqno->request conversion from John called request
      variables req, but some (not all) of Chris' recent patches changed
      those to just rq. We've had a lenghty (and inconclusive) discussion on
      irc which is the more meaningful name with maybe at most a slight bias
      towards req.
      
      Given that the "don't change names without good reason to avoid
      conflicts" rule applies, so lets go back to a req everywhere for
      consistency. I'll sed any patches for which this will cause conflicts
      before applying.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      [danvet: s/origina/merged/ as pointed out by Chris - the first
      mass-conversion patch was from Chris, the merged one from John.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      eed29a5b
  7. 20 5月, 2015 2 次提交
  8. 08 5月, 2015 6 次提交
  9. 24 4月, 2015 3 次提交
    • M
      drm/i915: Workaround to avoid lite restore with HEAD==TAIL · 53292cdb
      Michel Thierry 提交于
      WaIdleLiteRestore is an execlists-only workaround, and requires the driver
      to ensure that any context always has HEAD!=TAIL when attempting lite
      restore.
      
      Add two extra MI_NOOP instructions at the end of each request, but keep
      the requests tail pointing before the MI_NOOPs. We may not need to
      executed them, and this is why request->tail is sampled before adding
      these extra instructions.
      
      If we submit a context to the ELSP which has previously been submitted,
      move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.
      
      v2: Move overallocation to gen8_emit_request, and added note about
      sampling request->tail in commit message (Chris).
      
      v3: Remove redundant request->tail assignment in __i915_add_request, in
      lrc mode this is already set in execlists_context_queue.
      Do not add wa implementation details inside gem (Chris).
      
      v4: Apply the wa whenever the req has been resubmitted and update
      comment (Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      53292cdb
    • D
      drm/i915: Fix up the vma aliasing ppgtt binding · 0875546c
      Daniel Vetter 提交于
      Currently we have the problem that the decision whether ptes need to
      be (re)written is splattered all over the codebase. Move all that into
      i915_vma_bind. This needs a few changes:
      - Just reuse the PIN_* flags for i915_vma_bind and do the conversion
        to vma->bound in there to avoid duplicating the conversion code all
        over.
      - We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
        around) explicit, add PIN_USER for that.
      - Two callers want to update ptes, give them a PIN_UPDATE for that.
      
      Of course we still want to avoid double-binding, but that should be
      taken care of:
      - A ppgtt vma will only ever see PIN_USER, so no issue with
        double-binding.
      - A ggtt vma with aliasing ppgtt needs both types of binding, and we
        track that properly now.
      - A ggtt vma without aliasing ppgtt could be bound twice. In the
        lower-level ->bind_vma functions hence unconditionally set
        GLOBAL_BIND when writing the ggtt ptes.
      
      There's still a bit room for cleanup, but that's for follow-up
      patches.
      
      v2: Fixup fumbles.
      
      v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      0875546c
    • D
      drm/i915: Remove misleading comment around bind_to_vm · cd102a68
      Daniel Vetter 提交于
      It's true that we might need to context switch, but both the signalling
      and implementation of the same are a few source files away. Remove it.
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      cd102a68
  10. 20 4月, 2015 2 次提交
  11. 16 4月, 2015 1 次提交
    • T
      drm/i915: Fix view type in warning message · 5678ad73
      Tvrtko Ursulin 提交于
      One month passed between posting a patch and it getting merged, and
      unfortunately even though it still applies, it needs fixing to account
      for changes in function parameters since:
      
         commit d385612e15b8b6eb3db328d83f1872ef8a381788
         Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
         Date:   Tue Mar 17 14:45:29 2015 +0000
      
             drm/i915: Log view type when printing warnings
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      [danvet: Squash in fixup from Tvrtko to fix the rebase conflict.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5678ad73
  12. 13 4月, 2015 2 次提交
    • C
      drm/i915: Remove obj->pin_mappable · 30154650
      Chris Wilson 提交于
      The obj->pin_mappable flag only exists for debug purposes and is a
      hindrance that is mistreated with rotated GGTT views. For debug
      purposes, it suffices to mark objects with pin_display as being of note.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      30154650
    • C
      drm/i915: Optimistically spin for the request completion · 2def4ad9
      Chris Wilson 提交于
      This provides a nice boost to mesa in swap bound scenarios (as mesa
      throttles itself to the previous frame and given the scenario that will
      complete shortly). It will also provide a good boost to systems running
      with semaphores disabled and so frequently waiting on the GPU as it
      switches rings. In the most favourable of microbenchmarks, this can
      increase performance by around 15% - though in practice improvements
      will be marginal and rarely noticeable.
      
      v2: Account for user timeouts
      v3: Limit the spinning to a single jiffie (~1us) at most. On an
      otherwise idle system, there is no scheduler contention and so without a
      limit we would spin until the GPU is ready.
      v4: Drop forcewake - the lazy coherent access doesn't require it, and we
      have no reason to believe that the forcewake itself improves seqno
      coherency - it only adds delay.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2def4ad9
  13. 10 4月, 2015 9 次提交