1. 23 6月, 2015 12 次提交
    • J
      drm/i915: Update i915_gem_object_sync() to take a request structure · 91af127f
      John Harrison 提交于
      The plan is to pass requests around as the basic submission tracking structure
      rather than rings and contexts. This patch updates the i915_gem_object_sync()
      code path.
      
      v2: Much more complex patch to share a single request between the sync and the
      page flip. The _sync() function now supports lazy allocation of the request
      structure. That is, if one is passed in then that will be used. If one is not,
      then a request will be allocated and passed back out. Note that the _sync() code
      does not necessarily require a request. Thus one will only be created until
      certain situations. The reason the lazy allocation must be done within the
      _sync() code itself is because the decision to need one or not is not really
      something that code above can second guess (except in the case where one is
      definitely not required because no ring is passed in).
      
      The call chains above _sync() now support passing a request through which most
      callers passing in NULL and assuming that no request will be required (because
      they also pass in NULL for the ring and therefore can't be generating any ring
      code).
      
      The exeception is intel_crtc_page_flip() which now supports having a request
      returned from _sync(). If one is, then that request is shared by the page flip
      (if the page flip is of a type to need a request). If _sync() does not generate
      a request but the page flip does need one, then the page flip path will create
      its own request.
      
      v3: Updated comment description to be clearer about 'to_req' parameter (Tomas
      Elf review request). Rebased onto newer tree that significantly changed the
      synchronisation code.
      
      v4: Updated comments from review feedback (Tomas Elf)
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      91af127f
    • J
      drm/i915: Update i915_switch_context() to take a request structure · ba01cc93
      John Harrison 提交于
      Now that the request is guaranteed to specify the context, it is possible to
      update the context switch code to use requests rather than ring and context
      pairs. This patch updates i915_switch_context() accordingly.
      
      Also removed the warning that the request's context must match the last context
      switch's context. As the context switch now gets the context object from the
      request structure, there is no longer any scope for the two to become out of
      step.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      ba01cc93
    • J
      drm/i915: Add flag to i915_add_request() to skip the cache flush · 5b4a60c2
      John Harrison 提交于
      In order to explcitly track all GPU work (and completely remove the outstanding
      lazy request), it is necessary to add extra i915_add_request() calls to various
      places. Some of these do not need the implicit cache flush done as part of the
      standard batch buffer submission process.
      
      This patch adds a flag to _add_request() to specify whether the flush is
      required or not.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5b4a60c2
    • J
      drm/i915: Update execbuffer_move_to_active() to take a request structure · 8a8edb59
      John Harrison 提交于
      The plan is to pass requests around as the basic submission tracking structure
      rather than rings and contexts. This patch updates the
      execbuffer_move_to_active() code path.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8a8edb59
    • J
      drm/i915: Update move_to_gpu() to take a request structure · 535fbe82
      John Harrison 提交于
      The plan is to pass requests around as the basic submission tracking structure
      rather than rings and contexts. This patch updates the move_to_gpu() code paths.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      535fbe82
    • J
      drm/i915: Update the dispatch tracepoint to use params->request · 95c24161
      John Harrison 提交于
      Updated a couple of trace points to use the now cached request pointer rather
      than extracting it from the ring.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      95c24161
    • J
      drm/i915: Add request to execbuf params and add explicit cleanup · 6a6ae79a
      John Harrison 提交于
      Rather than just having a local request variable in the execbuff code, the
      request pointer is now stored in the execbuff params structure. Also added
      explicit cleanup of the request (plus wiping the OLR to match) in the error
      case. This means that the execbuff code is no longer dependent upon the OLR
      keeping track of the request so as to not leak it when things do go wrong. Note
      that in the success case, the i915_add_request() at the end of the submission
      function will tidy up the request and clear the OLR.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      6a6ae79a
    • J
      drm/i915: Update alloc_request to return the allocated request · 217e46b5
      John Harrison 提交于
      The alloc_request() function does not actually return the newly allocated
      request. Instead, it must be pulled from ring->outstanding_lazy_request. This
      patch fixes this so that code can create a request and start using it knowing
      exactly which request it actually owns.
      
      v2: Updated for new i915_gem_request_alloc() scheme.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      217e46b5
    • J
      drm/i915: Simplify i915_gem_execbuffer_retire_commands() parameters · adeca76d
      John Harrison 提交于
      Shrunk the parameter list of i915_gem_execbuffer_retire_commands() to a single
      structure as everything it requires is available in the execbuff_params object.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      adeca76d
    • J
      drm/i915: Merged the many do_execbuf() parameters into a structure · 5f19e2bf
      John Harrison 提交于
      The do_execbuf() function takes quite a few parameters. The actual set of
      parameters is going to change with the conversion to passing requests around.
      Further, it is due to grow massively with the arrival of the GPU scheduler.
      
      This patch simplifies the prototype by passing a parameter structure instead.
      Changing the parameter set in the future is then simply a matter of
      adding/removing items to the structure.
      
      Note that the structure does not contain absolutely everything that is passed
      in. This is because the intention is to use this structure more extensively
      later in this patch series and more especially in the GPU scheduler that is
      coming soon. The latter requires hanging on to the structure as the final
      hardware submission can be delayed until long after the execbuf IOCTL has
      returned to user land. Thus it is unsafe to put anything in the structure that
      is local to the IOCTL call itself - such as the 'args' parameter. All entries
      must be copies of data or pointers to structures that are reference counted in
      some way and guaranteed to exist for the duration of the batch buffer's life.
      
      v2: Rebased to newer tree and updated for changes to the command parser.
      Specifically, a code shuffle has required saving the batch start address in the
      params structure.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      5f19e2bf
    • J
      drm/i915: Early alloc request in execbuff · 0c8dac88
      John Harrison 提交于
      Start of explicit request management in the execbuffer code path. This patch
      adds a call to allocate a request structure before all the actual hardware work
      is done. Thus guaranteeing that all that work is tagged by a known request. At
      present, nothing further is done with the request, the rest comes later in the
      series.
      
      The only noticable change is that failure to get a request (e.g. due to lack of
      memory) will be caught earlier in the sequence. It now occurs right at the start
      before any un-undoable work has been done.
      
      v2: Simplified the error handling path.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0c8dac88
    • J
      drm/i915: i915_add_request must not fail · bf7dc5b7
      John Harrison 提交于
      The i915_add_request() function is called to keep track of work that has been
      written to the ring buffer. It adds epilogue commands to track progress (seqno
      updates and such), moves the request structure onto the right list and other
      such house keeping tasks. However, the work itself has already been written to
      the ring and will get executed whether or not the add request call succeeds. So
      no matter what goes wrong, there isn't a whole lot of point in failing the call.
      
      At the moment, this is fine(ish). If the add request does bail early on and not
      do the housekeeping, the request will still float around in the
      ring->outstanding_lazy_request field and be picked up next time. It means
      multiple pieces of work will be tagged as the same request and driver can't
      actually wait for the first piece of work until something else has been
      submitted. But it all sort of hangs together.
      
      This patch series is all about removing the OLR and guaranteeing that each piece
      of work gets its own personal request. That means that there is no more
      'hoovering up of forgotten requests'. If the request does not get tracked then
      it will be leaked. Thus the add request call _must_ not fail. The previous patch
      should have already ensured that it _will_ not fail by removing the potential
      for running out of ring space. This patch enforces the rule by actually removing
      the early exit paths and the return code.
      
      Note that if something does manage to fail and the epilogue commands don't get
      written to the ring, the driver will still hang together. The request will be
      added to the tracking lists. And as in the old case, any subsequent work will
      generate a new seqno which will suffice for marking the old one as complete.
      
      v2: Improved WARNings (Tomas Elf review request).
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      bf7dc5b7
  2. 22 6月, 2015 2 次提交
  3. 29 5月, 2015 1 次提交
  4. 21 5月, 2015 1 次提交
  5. 08 5月, 2015 2 次提交
  6. 30 4月, 2015 1 次提交
  7. 24 4月, 2015 3 次提交
    • D
      drm/i915: Fix up the vma aliasing ppgtt binding · 0875546c
      Daniel Vetter 提交于
      Currently we have the problem that the decision whether ptes need to
      be (re)written is splattered all over the codebase. Move all that into
      i915_vma_bind. This needs a few changes:
      - Just reuse the PIN_* flags for i915_vma_bind and do the conversion
        to vma->bound in there to avoid duplicating the conversion code all
        over.
      - We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
        around) explicit, add PIN_USER for that.
      - Two callers want to update ptes, give them a PIN_UPDATE for that.
      
      Of course we still want to avoid double-binding, but that should be
      taken care of:
      - A ppgtt vma will only ever see PIN_USER, so no issue with
        double-binding.
      - A ggtt vma with aliasing ppgtt needs both types of binding, and we
        track that properly now.
      - A ggtt vma without aliasing ppgtt could be bound twice. In the
        lower-level ->bind_vma functions hence unconditionally set
        GLOBAL_BIND when writing the ggtt ptes.
      
      There's still a bit room for cleanup, but that's for follow-up
      patches.
      
      v2: Fixup fumbles.
      
      v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      0875546c
    • D
      drm/i915: Don't use atomics for pg_dirty_rings · 9258811c
      Daniel Vetter 提交于
      It's already protected by the bkl^Wdev->struct_mutex. While at it
      realign some related code.
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      9258811c
    • D
      drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt · 71b7e54f
      Daniel Vetter 提交于
      We load the ppgtt ptes once per gpu reset/driver load/resume and
      that's all that's needed. Note that this only blows up when we're
      using the allocate_va_range funcs and not the special-purpose ones
      used. With this change we can get rid of that duplication.
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      71b7e54f
  8. 20 4月, 2015 1 次提交
    • D
      drm/i915: Dont clear PIN_GLOBAL in the execbuf pinning fallback · 0229da32
      Daniel Vetter 提交于
      PIN_GLOBAL is set only when userspace asked for it, and that
      is only the case for the gen6 PIPE_CONTROL workaround. We're not
      allowed to just clear this.
      
      The important part of the fallback is to drop the restriction to
      the mappable range.
      
      This issue has been introduced in
      
      commit edf4427b
      Author: Chris Wilson <chris@chris-wilson.co.uk>
      Date:   Wed Jan 14 11:20:56 2015 +0000
      
          drm/i915: Fallback to using CPU relocations for large batch buffers
      
      v2: Chris pointed out that we also miss to set PIN_GLOBAL when the
      buffer is already bound. Fix this up too.
      
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      0229da32
  9. 10 4月, 2015 2 次提交
  10. 01 4月, 2015 1 次提交
    • J
      drm/i915: Rename 'do_execbuf' to 'execbuf_submit' · f3dc74c0
      John Harrison 提交于
      The submission portion of the execbuffer code path was abstracted into a
      function pointer indirection as part of the legacy vs execlist work. The two
      implementation functions are called 'i915_gem_ringbuffer_submission' and
      'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
      already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
      indirection). The name of the pointer is therefore considered to be backwards
      and should be changed.
      
      This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.
      
      For: VIZ-5115
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NTomas Elf <tomas.elf@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      f3dc74c0
  11. 30 3月, 2015 1 次提交
  12. 27 3月, 2015 1 次提交
  13. 20 3月, 2015 2 次提交
    • B
      drm/i915: Track page table reload need · 563222a7
      Ben Widawsky 提交于
      This patch was formerly known as, "Force pd restore when PDEs change,
      gen6-7." I had to change the name because it is needed for GEN8 too.
      
      The real issue this is trying to solve is when a new object is mapped
      into the current address space. The GPU does not snoop the new mapping
      so we must do the gen specific action to reload the page tables.
      
      GEN8 and GEN7 do differ in the way they load page tables for the RCS.
      GEN8 does so with the context restore, while GEN7 requires the proper
      load commands in the command streamer. Non-render is similar for both.
      
      Caveat for GEN7
      The docs say you cannot change the PDEs of a currently running context.
      We never map new PDEs of a running context, and expect them to be
      present - so I think this is okay. (We can unmap, but this should also
      be okay since we only unmap unreferenced objects that the GPU shouldn't
      be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
      to signal that even if the context is the same, force a reload. It's
      unclear exactly what this does, but I have a hunch it's the right thing
      to do.
      
      The logic assumes that we always emit a context switch after mapping new
      PDEs, and before we submit a batch. This is the case today, and has been
      the case since the inception of hardware contexts. A note in the comment
      let's the user know.
      
      It's not just for gen8. If the current context has mappings change, we
      need a context reload to switch
      
      v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
      and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
      is always null.
      
      v3: Invalidate PPGTT TLBs inside alloc_va_range.
      
      v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
      pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
      neither ctx->ppgtt and aliasing_ppgtt exist.
      
      v5: Removed references to teardown_va_range.
      
      v6: Updated needs_pd_load_pre/post.
      
      v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move
      comment about updated PDEs to object_pin/bind (Mika).
      
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
      Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
      Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      563222a7
    • C
      drm/i915: Fallback to using CPU relocations for large batch buffers · edf4427b
      Chris Wilson 提交于
      If the batch buffer is too large to fit into the aperture and we need a
      GTT mapping for relocations, we currently fail. This only applies to a
      subset of machines for a subset of environments, quite undesirable. We
      can simply check after failing to insert the batch into the GTT as to
      whether we only need a mappable binding for relocation and, if so, we can
      revert to using a non-mappable binding and an alternate relocation
      method. However, using relocate_entry_cpu() is excruciatingly slow for
      large buffers on non-LLC as the entire buffer requires clflushing before
      and after the relocation handling. Alternatively, we can implement a
      third relocation method that only clflushes around the relocation entry.
      This is still slower than updating through the GTT, so we prefer using
      the GTT where possible, but is orders of magnitude faster as we
      typically do not have to then clflush the entire buffer.
      
      An alternative idea of using a temporary WC mapping of the backing store
      is promising (it should be faster than using the GTT itself), but
      requires fairly extensive arch/x86 support - along the lines of
      kmap_atomic_prof_pfn() (which is not universally implemented even for
      x86).
      
      Testcase: igt/gem_exec_big #pnv,byt
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      [danvet: Add a WARN_ONCE for the impossible reloc case and explain in
      a short comment why we want to avoid ping-pong.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      edf4427b
  14. 18 3月, 2015 2 次提交
  15. 26 2月, 2015 1 次提交
    • J
      drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading · 8e004efc
      John Harrison 提交于
      There is a flags word that is passed through the execbuffer code path all the
      way from initial decoding of the user parameters down to the very final dispatch
      buffer call. It is simply called 'flags'. Unfortuantely, there are many other
      flags words floating around in the same blocks of code. Even more once the GPU
      scheduler arrives.
      
      This patch makes it more obvious exactly which flags word is which by renaming
      'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
      already have an 'I915_DISPATCH_' prefix on them and so are not quite so
      ambiguous.
      
      OTC-Jira: VIZ-1587
      Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
      [danvet: Resolve conflict with Chris' rework of the bb parsing.]
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8e004efc
  16. 24 2月, 2015 1 次提交
  17. 27 1月, 2015 1 次提交
    • Z
      drm/i915: Specify bsd rings through exec flag · 8d360dff
      Zhipeng Gong 提交于
      On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical.
      For example, HEVC GPU commands can be only dispatched to VCS1 ring.
      But userspace has no control when using VCS1 or VCS2. This patch introduces
      a mechanism to avoid the default ping-pong mode and use one specific ring
      through execution flag. This mechanism is usable for all the platforms
      with 2 VCS rings.
      
      The open source usage is from these two commits in vaapi/intel:
      	commit 702050f04131a44ef8ac16651708ce8a8d98e4b8
      	Author: Zhao, Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:19 2014 +0800
      
      	    Allow the batchbuffer to be submitted with override flag
      
      	commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8
      	Author: Zhao Yakui <yakui.zhao@intel.com>
      	Date:   Mon Nov 17 12:44:22 2014 +0800
      
      	    Add the override flag to assure that HEVC video command
      		always uses BSD ring0 for SKL GT3 machine
      
      v2: fix whitespace (Rodrigo)
      v3: remove incorrect chunk that came on -collector rebase. (Rodrigo)
      v4: change the comment (Zhipeng)
      v5: address Daniel's comment (Zhipeng)
      Signed-off-by: NZhipeng Gong <zhipeng.gong@intel.com>
      Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      8d360dff
  18. 08 1月, 2015 1 次提交
  19. 24 12月, 2014 1 次提交
  20. 16 12月, 2014 3 次提交