1. 23 6月, 2015 5 次提交
    • J
      drm/i915: Update alloc_request to return the allocated request · 217e46b5
      John Harrison 提交于
      The alloc_request() function does not actually return the newly allocated
      request. Instead, it must be pulled from ring->outstanding_lazy_request. This
      patch fixes this so that code can create a request and start using it knowing
      exactly which request it actually owns.
      
      v2: Updated for new i915_gem_request_alloc() scheme.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      217e46b5
    • J
      drm/i915: Simplify i915_gem_execbuffer_retire_commands() parameters · adeca76d
      John Harrison 提交于
      Shrunk the parameter list of i915_gem_execbuffer_retire_commands() to a single
      structure as everything it requires is available in the execbuff_params object.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      adeca76d
    • J
      drm/i915: Merged the many do_execbuf() parameters into a structure · 5f19e2bf
      John Harrison 提交于
      The do_execbuf() function takes quite a few parameters. The actual set of
      parameters is going to change with the conversion to passing requests around.
      Further, it is due to grow massively with the arrival of the GPU scheduler.
      
      This patch simplifies the prototype by passing a parameter structure instead.
      Changing the parameter set in the future is then simply a matter of
      adding/removing items to the structure.
      
      Note that the structure does not contain absolutely everything that is passed
      in. This is because the intention is to use this structure more extensively
      later in this patch series and more especially in the GPU scheduler that is
      coming soon. The latter requires hanging on to the structure as the final
      hardware submission can be delayed until long after the execbuf IOCTL has
      returned to user land. Thus it is unsafe to put anything in the structure that
      is local to the IOCTL call itself - such as the 'args' parameter. All entries
      must be copies of data or pointers to structures that are reference counted in
      some way and guaranteed to exist for the duration of the batch buffer's life.
      
      v2: Rebased to newer tree and updated for changes to the command parser.
      Specifically, a code shuffle has required saving the batch start address in the
      params structure.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5f19e2bf
    • J
      drm/i915: Early alloc request in execbuff · 0c8dac88
      John Harrison 提交于
      Start of explicit request management in the execbuffer code path. This patch
      adds a call to allocate a request structure before all the actual hardware work
      is done. Thus guaranteeing that all that work is tagged by a known request. At
      present, nothing further is done with the request, the rest comes later in the
      series.
      
      The only noticable change is that failure to get a request (e.g. due to lack of
      memory) will be caught earlier in the sequence. It now occurs right at the start
      before any un-undoable work has been done.
      
      v2: Simplified the error handling path.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0c8dac88
    • J
      drm/i915: i915_add_request must not fail · bf7dc5b7
      John Harrison 提交于
      The i915_add_request() function is called to keep track of work that has been
      written to the ring buffer. It adds epilogue commands to track progress (seqno
      updates and such), moves the request structure onto the right list and other
      such house keeping tasks. However, the work itself has already been written to
      the ring and will get executed whether or not the add request call succeeds. So
      no matter what goes wrong, there isn't a whole lot of point in failing the call.
      
      At the moment, this is fine(ish). If the add request does bail early on and not
      do the housekeeping, the request will still float around in the
      ring->outstanding_lazy_request field and be picked up next time. It means
      multiple pieces of work will be tagged as the same request and driver can't
      actually wait for the first piece of work until something else has been
      submitted. But it all sort of hangs together.
      
      This patch series is all about removing the OLR and guaranteeing that each piece
      of work gets its own personal request. That means that there is no more
      'hoovering up of forgotten requests'. If the request does not get tracked then
      it will be leaked. Thus the add request call _must_ not fail. The previous patch
      should have already ensured that it _will_ not fail by removing the potential
      for running out of ring space. This patch enforces the rule by actually removing
      the early exit paths and the return code.
      
      Note that if something does manage to fail and the epilogue commands don't get
      written to the ring, the driver will still hang together. The request will be
      added to the tracking lists. And as in the old case, any subsequent work will
      generate a new seqno which will suffice for marking the old one as complete.
      
      v2: Improved WARNings (Tomas Elf review request).
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bf7dc5b7
  2. 22 6月, 2015 2 次提交
  3. 29 5月, 2015 1 次提交
  4. 21 5月, 2015 1 次提交
  5. 08 5月, 2015 2 次提交
  6. 30 4月, 2015 1 次提交
  7. 24 4月, 2015 3 次提交
    • D
      drm/i915: Fix up the vma aliasing ppgtt binding · 0875546c
      Daniel Vetter 提交于
      Currently we have the problem that the decision whether ptes need to
      be (re)written is splattered all over the codebase. Move all that into
      i915_vma_bind. This needs a few changes:
      - Just reuse the PIN_* flags for i915_vma_bind and do the conversion
        to vma->bound in there to avoid duplicating the conversion code all
        over.
      - We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
        around) explicit, add PIN_USER for that.
      - Two callers want to update ptes, give them a PIN_UPDATE for that.
      
      Of course we still want to avoid double-binding, but that should be
      taken care of:
      - A ppgtt vma will only ever see PIN_USER, so no issue with
        double-binding.
      - A ggtt vma with aliasing ppgtt needs both types of binding, and we
        track that properly now.
      - A ggtt vma without aliasing ppgtt could be bound twice. In the
        lower-level ->bind_vma functions hence unconditionally set
        GLOBAL_BIND when writing the ggtt ptes.
      
      There's still a bit room for cleanup, but that's for follow-up
      patches.
      
      v2: Fixup fumbles.
      
      v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      0875546c
    • D
      drm/i915: Don't use atomics for pg_dirty_rings · 9258811c
      Daniel Vetter 提交于
      It's already protected by the bkl^Wdev->struct_mutex. While at it
      realign some related code.
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      9258811c
    • D
      drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt · 71b7e54f
      Daniel Vetter 提交于
      We load the ppgtt ptes once per gpu reset/driver load/resume and
      that's all that's needed. Note that this only blows up when we're
      using the allocate_va_range funcs and not the special-purpose ones
      used. With this change we can get rid of that duplication.
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      71b7e54f
  8. 20 4月, 2015 1 次提交
    • D
      drm/i915: Dont clear PIN_GLOBAL in the execbuf pinning fallback · 0229da32
      Daniel Vetter 提交于
      PIN_GLOBAL is set only when userspace asked for it, and that
      is only the case for the gen6 PIPE_CONTROL workaround. We're not
      allowed to just clear this.
      
      The important part of the fallback is to drop the restriction to
      the mappable range.
      
      This issue has been introduced in
      
      commit edf4427b
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed Jan 14 11:20:56 2015 +0000
      
          drm/i915: Fallback to using CPU relocations for large batch buffers
      
      v2: Chris pointed out that we also miss to set PIN_GLOBAL when the
      buffer is already bound. Fix this up too.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0229da32
  9. 10 4月, 2015 2 次提交
  10. 01 4月, 2015 1 次提交
    • J
      drm/i915: Rename 'do_execbuf' to 'execbuf_submit' · f3dc74c0
      John Harrison 提交于
      The submission portion of the execbuffer code path was abstracted into a
      function pointer indirection as part of the legacy vs execlist work. The two
      implementation functions are called 'i915_gem_ringbuffer_submission' and
      'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
      already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
      indirection). The name of the pointer is therefore considered to be backwards
      and should be changed.
      
      This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f3dc74c0
  11. 30 3月, 2015 1 次提交
  12. 27 3月, 2015 1 次提交
  13. 20 3月, 2015 2 次提交
    • B
      drm/i915: Track page table reload need · 563222a7
      Ben Widawsky 提交于
      This patch was formerly known as, "Force pd restore when PDEs change,
      gen6-7." I had to change the name because it is needed for GEN8 too.
      
      The real issue this is trying to solve is when a new object is mapped
      into the current address space. The GPU does not snoop the new mapping
      so we must do the gen specific action to reload the page tables.
      
      GEN8 and GEN7 do differ in the way they load page tables for the RCS.
      GEN8 does so with the context restore, while GEN7 requires the proper
      load commands in the command streamer. Non-render is similar for both.
      
      Caveat for GEN7
      The docs say you cannot change the PDEs of a currently running context.
      We never map new PDEs of a running context, and expect them to be
      present - so I think this is okay. (We can unmap, but this should also
      be okay since we only unmap unreferenced objects that the GPU shouldn't
      be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
      to signal that even if the context is the same, force a reload. It's
      unclear exactly what this does, but I have a hunch it's the right thing
      to do.
      
      The logic assumes that we always emit a context switch after mapping new
      PDEs, and before we submit a batch. This is the case today, and has been
      the case since the inception of hardware contexts. A note in the comment
      let's the user know.
      
      It's not just for gen8. If the current context has mappings change, we
      need a context reload to switch
      
      v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
      and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
      is always null.
      
      v3: Invalidate PPGTT TLBs inside alloc_va_range.
      
      v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
      pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
      neither ctx->ppgtt and aliasing_ppgtt exist.
      
      v5: Removed references to teardown_va_range.
      
      v6: Updated needs_pd_load_pre/post.
      
      v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move
      comment about updated PDEs to object_pin/bind (Mika).
      
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      563222a7
    • C
      drm/i915: Fallback to using CPU relocations for large batch buffers · edf4427b
      Chris Wilson 提交于
      If the batch buffer is too large to fit into the aperture and we need a
      GTT mapping for relocations, we currently fail. This only applies to a
      subset of machines for a subset of environments, quite undesirable. We
      can simply check after failing to insert the batch into the GTT as to
      whether we only need a mappable binding for relocation and, if so, we can
      revert to using a non-mappable binding and an alternate relocation
      method. However, using relocate_entry_cpu() is excruciatingly slow for
      large buffers on non-LLC as the entire buffer requires clflushing before
      and after the relocation handling. Alternatively, we can implement a
      third relocation method that only clflushes around the relocation entry.
      This is still slower than updating through the GTT, so we prefer using
      the GTT where possible, but is orders of magnitude faster as we
      typically do not have to then clflush the entire buffer.
      
      An alternative idea of using a temporary WC mapping of the backing store
      is promising (it should be faster than using the GTT itself), but
      requires fairly extensive arch/x86 support - along the lines of
      kmap_atomic_prof_pfn() (which is not universally implemented even for
      x86).
      
      Testcase: igt/gem_exec_big #pnv,byt
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Add a WARN_ONCE for the impossible reloc case and explain in
      a short comment why we want to avoid ping-pong.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      edf4427b
  14. 18 3月, 2015 2 次提交
  15. 26 2月, 2015 1 次提交
    • J
      drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading · 8e004efc
      John Harrison 提交于
      There is a flags word that is passed through the execbuffer code path all the
      way from initial decoding of the user parameters down to the very final dispatch
      buffer call. It is simply called 'flags'. Unfortuantely, there are many other
      flags words floating around in the same blocks of code. Even more once the GPU
      scheduler arrives.
      
      This patch makes it more obvious exactly which flags word is which by renaming
      'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
      already have an 'I915_DISPATCH_' prefix on them and so are not quite so
      ambiguous.
      
      OTC-Jira: VIZ-1587
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      [danvet: Resolve conflict with Chris' rework of the bb parsing.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8e004efc
  16. 24 2月, 2015 1 次提交
  17. 27 1月, 2015 1 次提交
    • Z
      drm/i915: Specify bsd rings through exec flag · 8d360dff
      Zhipeng Gong 提交于
      On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical.
      For example, HEVC GPU commands can be only dispatched to VCS1 ring.
      But userspace has no control when using VCS1 or VCS2. This patch introduces
      a mechanism to avoid the default ping-pong mode and use one specific ring
      through execution flag. This mechanism is usable for all the platforms
      with 2 VCS rings.
      
      The open source usage is from these two commits in vaapi/intel:
      	commit 702050f04131a44ef8ac16651708ce8a8d98e4b8
      	Author: Zhao, Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:19 2014 +0800
      
      	    Allow the batchbuffer to be submitted with override flag
      
      	commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8
      	Author: Zhao Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:22 2014 +0800
      
      	    Add the override flag to assure that HEVC video command
      		always uses BSD ring0 for SKL GT3 machine
      
      v2: fix whitespace (Rodrigo)
      v3: remove incorrect chunk that came on -collector rebase. (Rodrigo)
      v4: change the comment (Zhipeng)
      v5: address Daniel's comment (Zhipeng)
      Signed-off-by: NZhipeng Gong <zhipeng.gong@intel.com>
      Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8d360dff
  18. 08 1月, 2015 1 次提交
  19. 24 12月, 2014 1 次提交
  20. 16 12月, 2014 4 次提交
    • B
      drm/i915: Tidy up execbuffer command parsing code · 71745376
      Brad Volkin 提交于
      Move it to a separate function since the main do_execbuffer function
      already has so much going on.
      
      v2:
      - Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7
        feedback)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      71745376
    • B
      drm/i915: Mark shadow batch buffers as purgeable · 0079a7df
      Brad Volkin 提交于
      By adding a new exec_entry flag, we cleanly mark the shadow objects
      as purgeable after they are on the active list.
      
      v2:
      - Move 'shadow_batch_obj->madv = I915_MADV_WILLNEED' inside _get
        fnc (danvet, from v4 6/7 feedback)
      
      v3:
      - Remove duplicate 'madv = I915_MADV_WILLNEED' (danvet, from v6 4/5)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0079a7df
    • B
      drm/i915: Use batch length instead of object size in command parser · b9ffd80e
      Brad Volkin 提交于
      Previously we couldn't trust the user-supplied batch length because
      it came directly from userspace (i.e. untrusted code). It would have
      affected what commands software parsed without regard to what hardware
      would actually execute, leaving a potential hole.
      
      With the parser now copying the user supplied batch buffer and writing
      MI_NOP commands to any space after the copied region, we can safely use
      the batch length input. This should be a performance win as the actual
      batch length is frequently much smaller than the allocated object size.
      
      v2: Fix handling of non-zero batch_start_offset
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      b9ffd80e
    • B
      drm/i915: Use batch pools with the command parser · 78a42377
      Brad Volkin 提交于
      This patch sets up all of the tracking and copying necessary to
      use batch pools with the command parser and dispatches the copied
      (shadow) batch to the hardware.
      
      After this patch, the parser is in 'enabling' mode.
      
      Note that performance takes a hit from the copy in some cases
      and will likely need some work. At a rough pass, the memcpy
      appears to be the bottleneck. Without having done a deeper
      analysis, two ideas that come to mind are:
      1) Copy sections of the batch at a time, as they are reached
         by parsing. Might improve cache locality.
      2) Copy only up to the userspace-supplied batch length and
         memset the rest of the buffer. Reduces the number of reads.
      
      v2:
      - Remove setting the capacity of the pool
      - One global pool instead of per-ring pools
      - Replace batch_obj with shadow_batch_obj and hook into eb->vmas
      - Memset any space in the shadow batch beyond what gets copied
      - Rebased on execlist prep refactoring
      
      v3:
      - Rebase on chained batch handling
      - Squash in setting the secure dispatch flag
      - Add a note about the interaction w/secure dispatch pinning
      - Check for request->batch_obj == NULL in i915_gem_free_request
      
      v4:
      - Fix read domains for shadow_batch_obj
      - Remove the set_to_gtt_domain call from i915_parse_cmds
      - ggtt_pin/unpin in the parser block to simplify error handling
      - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
      - Remove i915_gem_batch_pool_put calls
      
      v5:
      - Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
        the parser (danvet, from v4 0/7 feedback)
      
      Issue: VIZ-4719
      Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
      Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      78a42377
  21. 15 12月, 2014 1 次提交
    • T
      drm/i915: Infrastructure for supporting different GGTT views per object · fe14d5f4
      Tvrtko Ursulin 提交于
      Things like reliable GGTT mappings and mirrored 2d-on-3d display will need
      to map objects into the same address space multiple times.
      
      Added a GGTT view concept and linked it with the VMA to distinguish between
      multiple instances per address space.
      
      New objects and GEM functions which do not take this new view as a parameter
      assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the
      previous behaviour.
      
      This now means that objects can have multiple VMA entries so the code which
      assumed there will only be one also had to be modified.
      
      Alternative GGTT views are supposed to borrow DMA addresses from obj->pages
      which is DMA mapped on first VMA instantiation and unmapped on the last one
      going away.
      
      v2:
          * Removed per view special casing in i915_gem_ggtt_prepare /
            finish_object in favour of creating and destroying DMA mappings
            on first VMA instantiation and last VMA destruction. (Daniel Vetter)
          * Simplified i915_vma_unbind which does not need to count the GGTT views.
            (Daniel Vetter)
          * Also moved obj->map_and_fenceable reset under the same check.
          * Checkpatch cleanups.
      
      v3:
          * Only retire objects once the last VMA is unbound.
      
      v4:
          * Keep scatter-gather table for alternative views persistent for the
            lifetime of the VMA.
          * Propagate binding errors to callers and handle appropriately.
      
      v5:
          * Explicitly look for normal GGTT view in i915_gem_obj_bound to align
            usage in i915_gem_object_ggtt_unpin. (Michel Thierry)
          * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry)
          * Removed stray semi-colon in i915_gem_object_set_cache_level.
      
      For: VIZ-4544
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
      [danvet: Drop hunk from i915_gem_shrink since it's just prettification
      but upsets a __must_check warning.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      fe14d5f4
  22. 03 12月, 2014 4 次提交
  23. 21 11月, 2014 1 次提交