1. 21 5月, 2015 3 次提交
    • C
      drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts · a6f766f3
      Chris Wilson 提交于
      Ring switches can occur many times per frame, and are often out of
      control, causing frequent RPS boosting for no practical benefit. Treat
      the sw semaphore synchronisation as a separate client and only allow it
      to boost once per busy/idle cycle.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: s/rq/req/]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      a6f766f3
    • C
      drm/i915: Implement inter-engine read-read optimisations · b4716185
      Chris Wilson 提交于
      Currently, we only track the last request globally across all engines.
      This prevents us from issuing concurrent read requests on e.g. the RCS
      and BCS engines (or more likely the render and media engines). Without
      semaphores, we incur costly stalls as we synchronise between rings -
      greatly impacting the current performance of Broadwell versus Haswell in
      certain workloads (like video decode). With the introduction of
      reference counted requests, it is much easier to track the last request
      per ring, as well as the last global write request so that we can
      optimise inter-engine read read requests (as well as better optimise
      certain CPU waits).
      
      v2: Fix inverted readonly condition for nonblocking waits.
      v3: Handle non-continguous engine array after waits
      v4: Rebase, tidy, rewrite ring list debugging
      v5: Use obj->active as a bitfield, it looks cool
      v6: Micro-optimise, mostly involving moving code around
      v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
      v8: Rebase
      v9: Refactor i915_gem_object_sync() to allow the compiler to better
      optimise it.
      
      Benchmark: igt/gem_read_read_speed
      hsw:gt3e (with semaphores):
      Before: Time to read-read 1024k:		275.794µs
      After:  Time to read-read 1024k:		123.260µs
      
      hsw:gt3e (w/o semaphores):
      Before: Time to read-read 1024k:		230.433µs
      After:  Time to read-read 1024k:		124.593µs
      
      bdw-u (w/o semaphores):             Before          After
      Time to read-read 1x1:            26.274µs       10.350µs
      Time to read-read 128x128:        40.097µs       21.366µs
      Time to read-read 256x256:        77.087µs       42.608µs
      Time to read-read 512x512:       281.999µs      181.155µs
      Time to read-read 1024x1024:    1196.141µs     1118.223µs
      Time to read-read 2048x2048:    5639.072µs     5225.837µs
      Time to read-read 4096x4096:   22401.662µs    21137.067µs
      Time to read-read 8192x8192:   89617.735µs    85637.681µs
      
      Testcase: igt/gem_concurrent_blit (read-read and friends)
      Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
      [danvet: s/\<rq\>/req/g]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b4716185
    • D
      drm/i915: s/\<rq\>/req/g · eed29a5b
      Daniel Vetter 提交于
      The merged seqno->request conversion from John called request
      variables req, but some (not all) of Chris' recent patches changed
      those to just rq. We've had a lenghty (and inconclusive) discussion on
      irc which is the more meaningful name with maybe at most a slight bias
      towards req.
      
      Given that the "don't change names without good reason to avoid
      conflicts" rule applies, so lets go back to a req everywhere for
      consistency. I'll sed any patches for which this will cause conflicts
      before applying.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      [danvet: s/origina/merged/ as pointed out by Chris - the first
      mass-conversion patch was from Chris, the merged one from John.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      eed29a5b
  2. 20 5月, 2015 2 次提交
  3. 08 5月, 2015 6 次提交
  4. 24 4月, 2015 3 次提交
    • M
      drm/i915: Workaround to avoid lite restore with HEAD==TAIL · 53292cdb
      Michel Thierry 提交于
      WaIdleLiteRestore is an execlists-only workaround, and requires the driver
      to ensure that any context always has HEAD!=TAIL when attempting lite
      restore.
      
      Add two extra MI_NOOP instructions at the end of each request, but keep
      the requests tail pointing before the MI_NOOPs. We may not need to
      executed them, and this is why request->tail is sampled before adding
      these extra instructions.
      
      If we submit a context to the ELSP which has previously been submitted,
      move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.
      
      v2: Move overallocation to gen8_emit_request, and added note about
      sampling request->tail in commit message (Chris).
      
      v3: Remove redundant request->tail assignment in __i915_add_request, in
      lrc mode this is already set in execlists_context_queue.
      Do not add wa implementation details inside gem (Chris).
      
      v4: Apply the wa whenever the req has been resubmitted and update
      comment (Chris).
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
      Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      53292cdb
    • D
      drm/i915: Fix up the vma aliasing ppgtt binding · 0875546c
      Daniel Vetter 提交于
      Currently we have the problem that the decision whether ptes need to
      be (re)written is splattered all over the codebase. Move all that into
      i915_vma_bind. This needs a few changes:
      - Just reuse the PIN_* flags for i915_vma_bind and do the conversion
        to vma->bound in there to avoid duplicating the conversion code all
        over.
      - We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
        around) explicit, add PIN_USER for that.
      - Two callers want to update ptes, give them a PIN_UPDATE for that.
      
      Of course we still want to avoid double-binding, but that should be
      taken care of:
      - A ppgtt vma will only ever see PIN_USER, so no issue with
        double-binding.
      - A ggtt vma with aliasing ppgtt needs both types of binding, and we
        track that properly now.
      - A ggtt vma without aliasing ppgtt could be bound twice. In the
        lower-level ->bind_vma functions hence unconditionally set
        GLOBAL_BIND when writing the ggtt ptes.
      
      There's still a bit room for cleanup, but that's for follow-up
      patches.
      
      v2: Fixup fumbles.
      
      v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      0875546c
    • D
      drm/i915: Remove misleading comment around bind_to_vm · cd102a68
      Daniel Vetter 提交于
      It's true that we might need to context switch, but both the signalling
      and implementation of the same are a few source files away. Remove it.
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      cd102a68
  5. 20 4月, 2015 2 次提交
  6. 16 4月, 2015 1 次提交
    • T
      drm/i915: Fix view type in warning message · 5678ad73
      Tvrtko Ursulin 提交于
      One month passed between posting a patch and it getting merged, and
      unfortunately even though it still applies, it needs fixing to account
      for changes in function parameters since:
      
         commit d385612e15b8b6eb3db328d83f1872ef8a381788
         Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
         Date:   Tue Mar 17 14:45:29 2015 +0000
      
             drm/i915: Log view type when printing warnings
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      [danvet: Squash in fixup from Tvrtko to fix the rebase conflict.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5678ad73
  7. 13 4月, 2015 2 次提交
    • C
      drm/i915: Remove obj->pin_mappable · 30154650
      Chris Wilson 提交于
      The obj->pin_mappable flag only exists for debug purposes and is a
      hindrance that is mistreated with rotated GGTT views. For debug
      purposes, it suffices to mark objects with pin_display as being of note.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      30154650
    • C
      drm/i915: Optimistically spin for the request completion · 2def4ad9
      Chris Wilson 提交于
      This provides a nice boost to mesa in swap bound scenarios (as mesa
      throttles itself to the previous frame and given the scenario that will
      complete shortly). It will also provide a good boost to systems running
      with semaphores disabled and so frequently waiting on the GPU as it
      switches rings. In the most favourable of microbenchmarks, this can
      increase performance by around 15% - though in practice improvements
      will be marginal and rarely noticeable.
      
      v2: Account for user timeouts
      v3: Limit the spinning to a single jiffie (~1us) at most. On an
      otherwise idle system, there is no scheduler contention and so without a
      limit we would spin until the GPU is ready.
      v4: Drop forcewake - the lazy coherent access doesn't require it, and we
      have no reason to believe that the forcewake itself improves seqno
      coherency - it only adds delay.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      2def4ad9
  8. 10 4月, 2015 12 次提交
  9. 01 4月, 2015 2 次提交
  10. 30 3月, 2015 1 次提交
  11. 27 3月, 2015 2 次提交
  12. 26 3月, 2015 1 次提交
    • C
      drm/i915: Keep ring->active_list and ring->requests_list consistent · 832a3aad
      Chris Wilson 提交于
      If we retire requests last, we may use a later seqno and so clear
      the requests lists without clearing the active list, leading to
      confusion. Hence we should retire requests first for consistency with
      the early return. The order used to be important as the lifecycle for
      the object on the active list was determined by request->seqno. However,
      the requests themselves are now reference counted removing the
      constraint from the order of retirement.
      
      Fixes regression from
      
      commit 1b5a433a
      Author: John Harrison <John.C.Harrison@Intel.com>
      Date:   Mon Nov 24 18:49:42 2014 +0000
      
          drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed
      '
      
      and a
      
      	WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
      	WARN_ON(!list_empty(&vm->active_list))
      
      Identified by updating WATCH_LISTS:
      
      	[drm:i915_verify_lists] *ERROR* blitter ring: active list not empty, but no requests
      	WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230()
      	WARN_ON(i915_verify_lists(ring->dev))
      
      Note that this is only a problem in evict_vm where the following happens
      after a retire_request has cleaned out all requests, but not all active
      bo:
      - intel_ring_idle called from i915_gpu_idle notices that no requests are
        outstanding and immediately returns.
      - i915_gem_retire_requests_ring called from i915_gem_retire_requests also
        immediately returns when there's no request, still leaving the bo on the
        active list.
      - evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting
        all active objects that there's still stuff left that shouldn't be
        there.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NJani Nikula <jani.nikula@intel.com>
      832a3aad
  13. 23 3月, 2015 2 次提交
    • T
      drm/i915/skl: Support secondary (rotated) frame buffer mapping · 50470bb0
      Tvrtko Ursulin 提交于
      90/270 rotated scanout needs a rotated GTT view of the framebuffer.
      
      This is put in a separate VMA with a dedicated ggtt view and wired such that
      it is created when a framebuffer is pinned to a 90/270 rotated plane.
      
      Rotation is only possible with Yb/Yf buffers and error is propagated to
      user space in case of a mismatch.
      
      Special rotated page view is constructed at the VMA creation time by
      borrowing the DMA addresses from obj->pages.
      
      v2:
          * Do not bother with pages for rotated sg list, just populate the DMA
            addresses. (Daniel Vetter)
          * Checkpatch cleanup.
      
      v3:
          * Rebased on top of new plane handling (create rotated mapping when
            setting the rotation property).
          * Unpin rotated VMA on unpinning from display plane.
          * Simplify rotation check using bitwise AND. (Chris Wilson)
      
      v4:
          * Fix unpinning of optional rotated mapping so it is really considered
            to be optional.
      
      v5:
         * Rebased for fb modifier changes.
         * Rebased for atomic commit.
         * Only pin needed view for display. (Ville Syrjälä, Daniel Vetter)
      
      v6:
         * Rebased after preparatory work has been extracted out. (Daniel Vetter)
      
      v7:
         * Slightly simplified tiling geometry calculation.
         * Moved rotated GGTT view implementation into i915_gem_gtt.c (Daniel Vetter)
      
      v8:
         * Do not use i915_gem_obj_size to get object size since that actually
           returns the size of an VMA which may not exist.
         * Rebased for ggtt view changes.
      
      v9:
         * Rebased after code review changes on the preceding patches.
         * Tidy function definitions. (Joonas Lahtinen)
      
      For: VIZ-4726
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: Michel Thierry <michel.thierry@intel.com> (v4)
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      50470bb0
    • T
      drm/i915: Use GGTT view when (un)pinning objects to planes · e6617330
      Tvrtko Ursulin 提交于
      To support frame buffer rotation we need to be able to pass on the information
      on what kind of GGTT view is required for display.
      
      This patch just adds the parameter and makes all the callers default to the
      normal view.
      
      v2: Rebased for ggtt view changes.
      v3: Don't limit PIN_MAPPABLE to normal views just yet. (Joonas Lahtinen)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v3)
      [danvet: s/BUG/WARN/ in the patch hunk because. At least where the
      BUG_ON isn't fatal right away.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      e6617330
  14. 20 3月, 2015 1 次提交
    • B
      drm/i915: Track page table reload need · 563222a7
      Ben Widawsky 提交于
      This patch was formerly known as, "Force pd restore when PDEs change,
      gen6-7." I had to change the name because it is needed for GEN8 too.
      
      The real issue this is trying to solve is when a new object is mapped
      into the current address space. The GPU does not snoop the new mapping
      so we must do the gen specific action to reload the page tables.
      
      GEN8 and GEN7 do differ in the way they load page tables for the RCS.
      GEN8 does so with the context restore, while GEN7 requires the proper
      load commands in the command streamer. Non-render is similar for both.
      
      Caveat for GEN7
      The docs say you cannot change the PDEs of a currently running context.
      We never map new PDEs of a running context, and expect them to be
      present - so I think this is okay. (We can unmap, but this should also
      be okay since we only unmap unreferenced objects that the GPU shouldn't
      be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
      to signal that even if the context is the same, force a reload. It's
      unclear exactly what this does, but I have a hunch it's the right thing
      to do.
      
      The logic assumes that we always emit a context switch after mapping new
      PDEs, and before we submit a batch. This is the case today, and has been
      the case since the inception of hardware contexts. A note in the comment
      let's the user know.
      
      It's not just for gen8. If the current context has mappings change, we
      need a context reload to switch
      
      v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
      and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
      is always null.
      
      v3: Invalidate PPGTT TLBs inside alloc_va_range.
      
      v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
      pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
      neither ctx->ppgtt and aliasing_ppgtt exist.
      
      v5: Removed references to teardown_va_range.
      
      v6: Updated needs_pd_load_pre/post.
      
      v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move
      comment about updated PDEs to object_pin/bind (Mika).
      
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      563222a7