1. 28 5月, 2019 3 次提交
  2. 24 5月, 2019 4 次提交
  3. 22 5月, 2019 6 次提交
    • T
      drm/i915: Engine discovery query · c5d3e39c
      Tvrtko Ursulin 提交于
      Engine discovery query allows userspace to enumerate engines, probe their
      configuration features, all without needing to maintain the internal PCI
      ID based database.
      
      A new query for the generic i915 query ioctl is added named
      DRM_I915_QUERY_ENGINE_INFO, together with accompanying structure
      drm_i915_query_engine_info. The address of latter should be passed to the
      kernel in the query.data_ptr field, and should be large enough for the
      kernel to fill out all known engines as struct drm_i915_engine_info
      elements trailing the query.
      
      As with other queries, setting the item query length to zero allows
      userspace to query minimum required buffer size.
      
      Enumerated engines have common type mask which can be used to query all
      hardware engines, versus engines userspace can submit to using the execbuf
      uAPI.
      
      Engines also have capabilities which are per engine class namespace of
      bits describing features not present on all engine instances.
      
      v2:
       * Fixed HEVC assignment.
       * Reorder some fields, rename type to flags, increase width. (Lionel)
       * No need to allocate temporary storage if we do it engine by engine.
         (Lionel)
      
      v3:
       * Describe engine flags and mark mbz fields. (Lionel)
       * HEVC only applies to VCS.
      
      v4:
       * Squash SFC flag into main patch.
       * Tidy some comments.
      
      v5:
       * Add uabi_ prefix to engine capabilities. (Chris Wilson)
       * Report exact size of engine info array. (Chris Wilson)
       * Drop the engine flags. (Joonas Lahtinen)
       * Added some more reserved fields.
       * Move flags after class/instance.
      
      v6:
       * Do not check engine info array was zeroed by userspace but zero the
         unused fields for them instead.
      
      v7:
       * Simplify length calculation loop. (Lionel Landwerlin)
      
      v8:
       * Remove MBZ comments where not applicable.
       * Rename ABI flags to match engine class define naming.
       * Rename SFC ABI flag to reflect it applies to VCS and VECS.
       * SFC is wired to even _logical_ engine instances.
       * SFC applies to VCS and VECS.
       * HEVC is present on all instances on Gen11. (Tony)
       * Simplify length calculation even more. (Chris Wilson)
       * Move info_ptr assigment closer to loop for clarity. (Chris Wilson)
       * Use vdbox_sfc_access from runtime info.
       * Rebase for RUNTIME_INFO.
       * Refactor for lower indentation.
       * Rename uAPI class/instance to engine_class/instance to avoid C++
         keyword.
      
      v9:
       * Rebase for s/num_rings/num_engines/ in RUNTIME_INFO.
      
      v10:
       * Use new copy_query_item.
      
      v11:
       * Consolidate with struct i915_engine_class_instnace.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Jon Bloomfield <jon.bloomfield@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Tony Ye <tony.ye@intel.com>
      Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> # v7
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190522090054.6007-1-tvrtko.ursulin@linux.intel.com
      c5d3e39c
    • T
      drm/i915/icl: Add WaDisableBankHangMode · cbe3e1d1
      Tvrtko Ursulin 提交于
      Disable GPU hang by default on unrecoverable ECC cache errors.
      
      v2:
       * Rebase.
      
      v3:
       * Use intel_uncore_read. (Chris)
      
      Fixes: cc38cae7 ("drm/i915/icl: Introduce initial Icelake Workarounds")
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190520110442.403-2-tvrtko.ursulin@linux.intel.com
      cbe3e1d1
    • T
      drm/i915/selftests: Verify context workarounds · fde93886
      Tvrtko Ursulin 提交于
      Test context workarounds have been correctly applied in newly created
      contexts.
      
      To accomplish this the existing engine_wa_list_verify helper is extended
      to take in a context from which reading of the workaround list will be
      done.
      
      Context workaround verification is done from the existing subtests, which
      have been renamed to reflect they are no longer only about GT and engine
      workarounds.
      
      v2:
       * Test after resets and refactor to use intel_context more. (Chris)
      
      v3:
       * Use ce->engine->i915 instead of ce->gem_context->i915. (Chris)
       * gem_engine_iter.idx is engine->id + 1. (Chris)
      
      v4:
       * Make local function static.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190520142546.12493-1-tvrtko.ursulin@linux.intel.com
      fde93886
    • C
      drm/i915/execlists: Virtual engine bonding · ee113690
      Chris Wilson 提交于
      Some users require that when a master batch is executed on one particular
      engine, a companion batch is run simultaneously on a specific slave
      engine. For this purpose, we introduce virtual engine bonding, allowing
      maps of master:slaves to be constructed to constrain which physical
      engines a virtual engine may select given a fence on a master engine.
      
      For the moment, we continue to ignore the issue of preemption deferring
      the master request for later. Ideally, we would like to then also remove
      the slave and run something else rather than have it stall the pipeline.
      With load balancing, we should be able to move workload around it, but
      there is a similar stall on the master pipeline while it may wait for
      the slave to be executed. At the cost of more latency for the bonded
      request, it may be interesting to launch both on their engines in
      lockstep. (Bubbles abound.)
      
      Opens: Also what about bonding an engine as its own master? It doesn't
      break anything internally, so allow the silliness.
      
      v2: Emancipate the bonds
      v3: Couple in delayed scheduling for the selftests
      v4: Handle invalid mutually exclusive bonding
      v5: Mention what the uapi does
      v6: s/nbond/num_bonds/
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk
      ee113690
    • C
      drm/i915: Apply an execution_mask to the virtual_engine · 78e41ddd
      Chris Wilson 提交于
      Allow the user to direct which physical engines of the virtual engine
      they wish to execute one, as sometimes it is necessary to override the
      load balancing algorithm.
      
      v2: Only kick the virtual engines on context-out if required
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk
      78e41ddd
    • C
      drm/i915: Load balancing across a virtual engine · 6d06779e
      Chris Wilson 提交于
      Having allowed the user to define a set of engines that they will want
      to only use, we go one step further and allow them to bind those engines
      into a single virtual instance. Submitting a batch to the virtual engine
      will then forward it to any one of the set in a manner as best to
      distribute load.  The virtual engine has a single timeline across all
      engines (it operates as a single queue), so it is not able to concurrently
      run batches across multiple engines by itself; that is left up to the user
      to submit multiple concurrent batches to multiple queues. Multiple users
      will be load balanced across the system.
      
      The mechanism used for load balancing in this patch is a late greedy
      balancer. When a request is ready for execution, it is added to each
      engine's queue, and when an engine is ready for its next request it
      claims it from the virtual engine. The first engine to do so, wins, i.e.
      the request is executed at the earliest opportunity (idle moment) in the
      system.
      
      As not all HW is created equal, the user is still able to skip the
      virtual engine and execute the batch on a specific engine, all within the
      same queue. It will then be executed in order on the correct engine,
      with execution on other virtual engines being moved away due to the load
      detection.
      
      A couple of areas for potential improvement left!
      
      - The virtual engine always take priority over equal-priority tasks.
      Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
      and hopefully the virtual and real engines are not then congested (i.e.
      all work is via virtual engines, or all work is to the real engine).
      
      - We require the breadcrumb irq around every virtual engine request. For
      normal engines, we eliminate the need for the slow round trip via
      interrupt by using the submit fence and queueing in order. For virtual
      engines, we have to allow any job to transfer to a new ring, and cannot
      coalesce the submissions, so require the completion fence instead,
      forcing the persistent use of interrupts.
      
      - We only drip feed single requests through each virtual engine and onto
      the physical engines, even if there was enough work to fill all ELSP,
      leaving small stalls with an idle CS event at the end of every request.
      Could we be greedy and fill both slots? Being lazy is virtuous for load
      distribution on less-than-full workloads though.
      
      Other areas of improvement are more general, such as reducing lock
      contention, reducing dispatch overhead, looking at direct submission
      rather than bouncing around tasklets etc.
      
      sseu: Lift the restriction to allow sseu to be reconfigured on virtual
      engines composed of RENDER_CLASS (rcs).
      
      v2: macroize check_user_mbz()
      v3: Cancel virtual engines on wedging
      v4: Commence commenting
      v5: Replace 64b sibling_mask with a list of class:instance
      v6: Drop the one-element array in the uabi
      v7: Assert it is an virtual engine in to_virtual_engine()
      v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)
      
      Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk
      6d06779e
  4. 17 5月, 2019 3 次提交
    • C
      drm/i915/execlists: Drop promotion on unsubmit · 4cc79cbb
      Chris Wilson 提交于
      With the disappearance of NEWCLIENT, we no longer need to provide the
      priority boost on preemption in order to prevent repeated gazumping,
      and we can remove the dead code.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-5-chris@chris-wilson.co.uk
      4cc79cbb
    • C
      drm/i915: Downgrade NEWCLIENT to non-preemptive · 68fc728b
      Chris Wilson 提交于
      Commit 1413b2bc ("drm/i915: Trim NEWCLIENT boosting") had the
      intended consequence of not allowing a sequence of work that merely
      crossed into a new engine the privilege to be promoted to NEWCLIENT
      status. It also had the unintended consequence of actually making
      NEWCLIENT effective on heavily oversubscribed transcode machines and
      impacting upon their throughput.
      
      If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of
      those clients, using the NEWCLIENT boost that will be scheduled as
      
      	rcsA x 30, (rcsB, vcs) x 30
      
      where as before it would have been
      
      	(rcsA, rcsB, vcs) x 30
      
      That is with NEWCLIENT only boosting the first request of each client,
      we would execute all rcsA requests prior to running on the vcs engines;
      acruing a lot of dead time as compared to the previous case where the
      vcs engine would be started in parallel to processing the second client.
      
      The previous patch has the effect of delaying submission until it is
      required by a third party (either the user with an explicit wait, or by
      another client/engine). We reduce the NEWCLIENT bump to a mere WAIT,
      which has the effect of removing its preemptive grant and reducing it to
      the same level as any other user interaction -- that it will not be
      promoted above the interengine dependencies, and so preventing NEWCLIENTS
      from starving other engines. This a large nerf to the rrul properties of
      the current NEWCLIENT, but it still does give prioritised submission to
      new requests from light workloads.
      
      References: b16c7651 ("drm/i915: Priority boost for new clients")
      Fixes: 1413b2bc ("drm/i915: Trim NEWCLIENT boosting") # customer impact
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk
      68fc728b
    • C
      drm/i915: Bump signaler priority on adding a waiter · 6e7eb7a8
      Chris Wilson 提交于
      The handling of the no-preemption priority level imposes the restriction
      that we need to maintain the implied ordering even though preemption is
      disabled. Otherwise we may end up with an AB-BA deadlock across multiple
      engine due to a real preemption event reordering the no-preemption
      WAITs. To resolve this issue we currently promote all requests to WAIT
      on unsubmission, however this interferes with the timeslicing
      requirement that we do not apply any implicit promotion that will defeat
      the round-robin timeslice list. (If we automatically promote the active
      request it will go back to the head of the queue and not the tail!)
      
      So we need implicit promotion to prevent reordering around semaphores
      where we are not allowed to preempt, and we must avoid implicit
      promotion on unsubmission. So instead of at unsubmit, if we apply that
      implicit promotion on adding the dependency, we avoid the semaphore
      deadlock and we also reduce the gains made by the promotion for user
      space waiting. Furthermore, by keeping the earlier dependencies at a
      higher level, we reduce the search space for timeslicing without
      altering runtime scheduling too badly (no dependencies at all will be
      assigned a higher priority for rrul).
      
      v2: Limit the bump to external edges (as originally intended) i.e.
      between contexts and out to the user.
      
      Testcase: igt/gem_concurrent_blit
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk
      6e7eb7a8
  5. 08 5月, 2019 5 次提交
  6. 07 5月, 2019 2 次提交
  7. 04 5月, 2019 1 次提交
    • C
      drm/i915: Disable semaphore busywaits on saturated systems · ca6e56f6
      Chris Wilson 提交于
      Asking the GPU to busywait on a memory address, perhaps not unexpectedly
      in hindsight for a shared system, leads to bus contention that affects
      CPU programs trying to concurrently access memory. This can manifest as
      a drop in transcode throughput on highly over-saturated workloads.
      
      The only clue offered by perf, is that the bus-cycles (perf stat -e
      bus-cycles) jumped by 50% when enabling semaphores. This corresponds
      with extra CPU active cycles being attributed to intel_idle's mwait.
      
      This patch introduces a heuristic to try and detect when more than one
      client is submitting to the GPU pushing it into an oversaturated state.
      As we already keep track of when the semaphores are signaled, we can
      inspect their state on submitting the busywait batch and if we planned
      to use a semaphore but were too late, conclude that the GPU is
      overloaded and not try to use semaphores in future requests. In
      practice, this means we optimistically try to use semaphores for the
      first frame of a transcode job split over multiple engines, and fail if
      there are multiple clients active and continue not to use semaphores for
      the subsequent frames in the sequence. Periodically, we try to
      optimistically switch semaphores back on whenever the client waits to
      catch up with the transcode results.
      
      With 1 client, on Broxton J3455, with the relative fps normalized by %cpu:
      
      x no semaphores
      + drm-tip
      * patched
      +------------------------------------------------------------------------+
      |                                                    *                   |
      |                                                    *+                  |
      |                                                    **+                 |
      |                                                    **+  x              |
      |                                x               *  +**+  x              |
      |                                x  x       *    *  +***x xx             |
      |                                x  x       *    * *+***x *x             |
      |                                x  x*   +  *    * *****x *x x           |
      |                         +    x xx+x*   + ***   * ********* x   *       |
      |                         +    x xx+x*   * *** +** ********* xx  *       |
      |    *   +         ++++*  +    x*x****+*+* ***+*************+x*  *       |
      |*+ +** *+ + +* + *++****** *xxx**********x***+*****************+*++    *|
      |                                   |__________A_____M_____|             |
      |                           |_______________A____M_________|             |
      |                                 |____________A___M________|            |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       2.60475       3.50941       3.31123     3.2143953    0.21117399
      + 120        2.3826       3.57077       3.25101     3.1414161    0.28146407
      Difference at 95.0% confidence
      	-0.0729792 +/- 0.0629585
      	-2.27039% +/- 1.95864%
      	(Student's t, pooled s = 0.248814)
      * 120       2.35536       3.66713        3.2849     3.2059917    0.24618565
      No difference proven at 95.0% confidence
      
      With 10 clients over-saturating the pipeline:
      
      x no semaphores
      + drm-tip
      * patched
      +------------------------------------------------------------------------+
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                        **       |
      |                     ++                                    xx ***       |
      |                     ++                                    xx ***       |
      |                     ++                                    xxx***       |
      |                     ++                                    xxx***       |
      |                    +++                                    xxx***       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    +++                                    xx****       |
      |                    ++++                                   xx****       |
      |                   +++++                                   xx****       |
      |                   +++++                                 x x******      |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xxx*******     |
      |                  ++++++                                 xx********     |
      |                  ++++++                               xxxx********     |
      |                  ++++++                               xxxx********     |
      |                ++++++++                             xxxxx*********     |
      |+ +  +        + ++++++++                           xxx*xx**********x*  *|
      |                                                         |__A__|        |
      |                 |__AM__|                                               |
      |                                                            |__A_|      |
      +------------------------------------------------------------------------+
          N           Min           Max        Median           Avg        Stddev
      x 120       2.47855        2.8972       2.72376     2.7193402   0.074604933
      + 120       1.17367       1.77459       1.71977     1.6966782   0.085850697
      Difference at 95.0% confidence
      	-1.02266 +/- 0.0203502
      	-37.607% +/- 0.748352%
      	(Student's t, pooled s = 0.0804246)
      * 120       2.57868       3.00821       2.80142     2.7923878   0.058646477
      Difference at 95.0% confidence
      	0.0730476 +/- 0.0169791
      	2.68622% +/- 0.624383%
      	(Student's t, pooled s = 0.0671018)
      
      Indicating that we've recovered the regression from enabling semaphores
      on this saturated setup, with a hint towards an overall improvement.
      
      Very similar, but of smaller magnitude, results are observed on both
      Skylake(gt2) and Kabylake(gt4). This may be due to the reduced impact of
      bus-cycles, where we see a 50% hit on Broxton, it is only 10% on the big
      core, in this particular test.
      
      One observation to make here is that for a greedy client trying to
      maximise its own throughput, using semaphores is the right choice. It is
      only the holistic system-wide view that semaphores of one client
      impacts another and reduces the overall throughput where we would choose
      to disable semaphores.
      
      The most noticeable negactive impact this has is on the no-op
      microbenchmarks, which are also very notable for having no cpu bus load.
      In particular, this increases the runtime and energy consumption of
      gem_exec_whisper.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
      Cc: Dmitry Ermilov <dmitry.ermilov@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190504070707.30902-1-chris@chris-wilson.co.uk
      ca6e56f6
  8. 03 5月, 2019 2 次提交
  9. 02 5月, 2019 1 次提交
  10. 01 5月, 2019 1 次提交
  11. 30 4月, 2019 3 次提交
  12. 27 4月, 2019 6 次提交
  13. 26 4月, 2019 3 次提交
    • C
      drm/i915: Enable render context support for gen4 (Broadwater to Cantiga) · 9ce9bdb0
      Chris Wilson 提交于
      Broadwater and the rest of gen4  do support being able to saving and
      reloading context specific registers between contexts, providing isolation
      of the basic GPU state (as programmable by userspace). This allows
      userspace to assume that the GPU retains their state from one batch to the
      next, minimising the amount of state it needs to reload and manually save
      across batches.
      
      v2: CONSTANT_BUFFER woes
      
      Running through piglit turned up an interesting issue, a GPU hang inside
      the context load. The context image includes the CONSTANT_BUFFER command
      that loads an address into a on-gpu buffer, and the context load was
      executing that immediately. However, since it was reading from the GTT
      there is no guarantee that the GTT retains the same configuration as
      when the context was saved, resulting in stray reads and a GPU hang.
      
      Having tried issuing a CONSTANT_BUFFER (to disable the command) from the
      ring before saving the context to no avail, we resort to patching out
      the instruction inside the context image before loading.
      
      This does impose that gen4 always reissues CONSTANT_BUFFER commands on
      each batch, but due to the use of a shared GTT that was and will remain
      a requirement.
      
      v3: ECOSKPD to the rescue
      
      Ville found the magic bit in the ECOSKPD to disable saving and restoring
      the CONSTANT_BUFFER from the context image, thereby completely avoiding
      the GPU hangs from chasing invalid pointers. This appears to be the
      default behaviour for gen5, and so we just need to tweak gen4 to match.
      
      v4: Fix spelling of ECOSKPD and discover it already exists
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: NKenneth Graunke <kenneth@whitecape.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190419172720.5462-1-chris@chris-wilson.co.uk
      9ce9bdb0
    • C
      drm/i915: Enable render context support for Ironlake (gen5) · 1215d28e
      Chris Wilson 提交于
      Ironlake does support being able to saving and reloading context specific
      registers between contexts, providing isolation of the basic GPU state
      (as programmable by userspace). This allows userspace to assume that the
      GPU retains their state from one batch to the next, minimising the
      amount of state it needs to reload, or manually save and restore.
      
      v2: Fix off-by-one in reading CXT_SIZE, and add a comment that the
      CXT_SIZE and context-layout do not match in bspec, but the difference is
      irrelevant as we overallocate the full page anyway (Ville).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Kenneth Graunke <kenneth@whitecape.org>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-2-chris@chris-wilson.co.uk
      1215d28e
    • C
      drm/i915/ringbuffer: EMIT_INVALIDATE *before* switch context · 928f8f42
      Chris Wilson 提交于
      Despite what I think the prm recommends, commit f2253bd9
      ("drm/i915/ringbuffer: EMIT_INVALIDATE after switch context") turned out
      to be a huge mistake when enabling Ironlake contexts as the GPU would
      hang on either a MI_FLUSH or PIPE_CONTROL immediately following the
      MI_SET_CONTEXT of an active mesa context (more vanilla contexts, e.g.
      simple rendercopies with igt, do not suffer).
      
      Ville found the following clue,
      
        "[DevCTG+]: For the invalidate operation of the pipe control, the
         following pointers are affected. The
         invalidate operation affects the restore of these packets. If the pipe
         control invalidate operation is completed
         before the context save, the indirect pointers will not be restored from
         memory.
         1. Pipeline State Pointer
         2. Media State Pointer
         3. Constant Buffer Packet"
      
      which suggests by us emitting the INVALIDATE prior to the MI_SET_CONTEXT,
      we prevent the context-restore from chasing the dangling pointers within
      the image, and explains why this likely prevents the GPU hang.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190419111749.3910-1-chris@chris-wilson.co.uk
      928f8f42