1. 21 6月, 2019 3 次提交
  2. 15 6月, 2019 1 次提交
    • C
      drm/i915: Keep contexts pinned until after the next kernel context switch · ce476c80
      Chris Wilson 提交于
      We need to keep the context image pinned in memory until after the GPU
      has finished writing into it. Since it continues to write as we signal
      the final breadcrumb, we need to keep it pinned until the request after
      it is complete. Currently we know the order in which requests execute on
      each engine, and so to remove that presumption we need to identify a
      request/context-switch we know must occur after our completion. Any
      request queued after the signal must imply a context switch, for
      simplicity we use a fresh request from the kernel context.
      
      The sequence of operations for keeping the context pinned until saved is:
      
       - On context activation, we preallocate a node for each physical engine
         the context may operate on. This is to avoid allocations during
         unpinning, which may be from inside FS_RECLAIM context (aka the
         shrinker)
      
       - On context deactivation on retirement of the last active request (which
         is before we know the context has been saved), we add the
         preallocated node onto a barrier list on each engine
      
       - On engine idling, we emit a switch to kernel context. When this
         switch completes, we know that all previous contexts must have been
         saved, and so on retiring this request we can finally unpin all the
         contexts that were marked as deactivated prior to the switch.
      
      We can enhance this in future by flushing all the idle contexts on a
      regular heartbeat pulse of a switch to kernel context, which will also
      be used to check for hung engines.
      
      v2: intel_context_active_acquire/_release
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190614164606.15633-1-chris@chris-wilson.co.uk
      ce476c80
  3. 14 6月, 2019 3 次提交
  4. 12 6月, 2019 1 次提交
  5. 06 6月, 2019 1 次提交
  6. 28 5月, 2019 1 次提交
  7. 08 5月, 2019 2 次提交
  8. 27 4月, 2019 1 次提交
  9. 25 4月, 2019 3 次提交
  10. 21 3月, 2019 1 次提交
  11. 08 3月, 2019 1 次提交
  12. 06 3月, 2019 1 次提交
  13. 28 2月, 2019 2 次提交
  14. 09 2月, 2019 1 次提交
    • C
      drm/i915: Revoke mmaps and prevent access to fence registers across reset · 2caffbf1
      Chris Wilson 提交于
      Previously, we were able to rely on the recursive properties of
      struct_mutex to allow us to serialise revoking mmaps and reacquiring the
      FENCE registers with them being clobbered over a global device reset.
      I then proceeded to throw out the baby with the bath water in order to
      pursue a struct_mutex-less reset.
      
      Perusing LWN for alternative strategies, the dilemma on how to serialise
      access to a global resource on one side was answered by
      https://lwn.net/Articles/202847/ -- Sleepable RCU:
      
          1  int readside(void) {
          2      int idx;
          3      rcu_read_lock();
          4	   if (nomoresrcu) {
          5          rcu_read_unlock();
          6	       return -EINVAL;
          7      }
          8	   idx = srcu_read_lock(&ss);
          9	   rcu_read_unlock();
          10	   /* SRCU read-side critical section. */
          11	   srcu_read_unlock(&ss, idx);
          12	   return 0;
          13 }
          14
          15 void cleanup(void)
          16 {
          17     nomoresrcu = 1;
          18     synchronize_rcu();
          19     synchronize_srcu(&ss);
          20     cleanup_srcu_struct(&ss);
          21 }
      
      No more worrying about stop_machine, just an uber-complex mutex,
      optimised for reads, with the overhead pushed to the rare reset path.
      
      However, we do run the risk of a deadlock as we allocate underneath the
      SRCU read lock, and the allocation may require a GPU reset, causing a
      dependency cycle via the in-flight requests. We resolve that by declaring
      the driver wedged and cancelling all in-flight rendering.
      
      v2: Use expedited rcu barriers to match our earlier timing
      characteristics.
      v3: Try to annotate locking contexts for sparse
      v4: Reduce selftest lock duration to avoid a reset deadlock with fences
      v5: s/srcu/reset_backoff_srcu/
      v6: Remove more stale comments
      
      Testcase: igt/gem_mmap_gtt/hang
      Fixes: eb8d0f5a ("drm/i915: Remove GPU reset dependence on struct_mutex")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Mika Kuoppala <mika.kuoppala@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-2-chris@chris-wilson.co.uk
      2caffbf1
  15. 29 1月, 2019 1 次提交
  16. 25 1月, 2019 1 次提交
  17. 22 1月, 2019 1 次提交
  18. 16 1月, 2019 1 次提交
  19. 15 1月, 2019 1 次提交
  20. 08 1月, 2019 1 次提交
  21. 07 7月, 2018 1 次提交
  22. 24 5月, 2018 1 次提交
  23. 09 5月, 2018 1 次提交
  24. 04 5月, 2018 1 次提交
    • C
      drm/i915: Lazily unbind vma on close · 3365e226
      Chris Wilson 提交于
      When userspace is passing around swapbuffers using DRI, we frequently
      have to open and close the same object in the foreign address space.
      This shows itself as the same object being rebound at roughly 30fps
      (with a second object also being rebound at 30fps), which involves us
      having to rewrite the page tables and maintain the drm_mm range manager
      every time.
      
      However, since the object still exists and it is only the local handle
      that disappears, if we are lazy and do not unbind the VMA immediately
      when the local user closes the object but defer it until the GPU is
      idle, then we can reuse the same VMA binding. We still have to be
      careful to mark the handle and lookup tables as closed to maintain the
      uABI, just allowing the underlying VMA to be resurrected if the user is
      able to access the same object from the same context again.
      
      If the object itself is destroyed (neither userspace keeping a handle to
      it), the VMA will be reaped immediately as usual.
      
      In the future, this will be even more useful as instantiating a new VMA
      for use on the GPU will become heavier. A nuisance indeed, so nip it in
      the bud.
      
      v2: s/__i915_vma_final_close/i915_vma_destroy/ etc.
      v3: Leave a hint as to why we deferred the unbind on close.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180503195115.22309-1-chris@chris-wilson.co.uk
      3365e226
  25. 03 5月, 2018 2 次提交
    • C
      drm/i915: Split i915_gem_timeline into individual timelines · a89d1f92
      Chris Wilson 提交于
      We need to move to a more flexible timeline that doesn't assume one
      fence context per engine, and so allow for a single timeline to be used
      across a combination of engines. This means that preallocating a fence
      context per engine is now a hindrance, and so we want to introduce the
      singular timeline. From the code perspective, this has the notable
      advantage of clearing up a lot of mirky semantics and some clumsy
      pointer chasing.
      
      By splitting the timeline up into a single entity rather than an array
      of per-engine timelines, we can realise the goal of the previous patch
      of tracking the timeline alongside the ring.
      
      v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
      uninitialised.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
      a89d1f92
    • C
      drm/i915: Move timeline from GTT to ring · 65fcb806
      Chris Wilson 提交于
      In the future, we want to move a request between engines. To achieve
      this, we first realise that we have two timelines in effect here. The
      first runs through the GTT is required for ordering vma access, which is
      tracked currently by engine. The second is implied by sequential
      execution of commands inside the ringbuffer. This timeline is one that
      maps to userspace's expectations when submitting requests (i.e. given the
      same context, batch A is executed before batch B). As the rings's
      timelines map to userspace and the GTT timeline an implementation
      detail, move the timeline from the GTT into the ring itself (per-context
      in logical-ring-contexts/execlists, or a global per-engine timeline for
      the shared ringbuffers in legacy submission.
      
      The two timelines are still assumed to be equivalent at the moment (no
      migrating requests between engines yet) and so we can simply move from
      one to the other without adding extra ordering.
      
      v2: Reinforce that one isn't allowed to mix the engine execution
      timeline with the client timeline from userspace (on the ring).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
      65fcb806
  26. 30 4月, 2018 2 次提交
    • C
      drm/i915: Only track live rings for retiring · 643b450a
      Chris Wilson 提交于
      We don't need to track every ring for its lifetime as they are managed
      by the contexts/engines. What we do want to track are the live rings so
      that we can sporadically clean up requests if userspace falls behind. We
      can simply restrict the gt->rings list to being only gt->live_rings.
      
      v2: s/live/active/ for consistency with gt.active_requests
      Suggested-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
      643b450a
    • C
      drm/i915: Retire requests along rings · b887d615
      Chris Wilson 提交于
      In the next patch, rings are the central timeline as requests may jump
      between engines. Therefore in the future as we retire in order along the
      engine timeline, we may retire out-of-order within a ring (as the ring now
      occurs along multiple engines), leading to much hilarity in miscomputing
      the position of ring->head.
      
      As an added bonus, retiring along the ring reduces the penalty of having
      one execlists client do cleanup for another (old legacy submission
      shares a ring between all clients). The downside is that slow and
      irregular (off the critical path) process of cleaning up stale requests
      after userspace becomes a modicum less efficient.
      
      In the long run, it will become apparent that the ordered
      ring->request_list matches the ring->timeline, a fun challenge for the
      future will be unifying the two lists to avoid duplication!
      
      v2: We need both engine-order and ring-order processing to maintain our
      knowledge of where individual rings have completed upto as well as
      knowing what was last executing on any engine. And finally by decoupling
      retiring the contexts on the engine and the timelines along the rings,
      we do have to keep a reference to the context on each request
      (previously it was guaranteed by the context being pinned).
      
      v3: Not just a reference to the context, but we need to keep it pinned
      as we manipulate the rings; i.e. we need a pin for both the manipulation
      of the engine state during its retirements, and a separate pin for the
      manipulation of the ring state.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
      b887d615
  27. 22 2月, 2018 1 次提交
  28. 08 2月, 2018 1 次提交
  29. 11 12月, 2017 1 次提交
  30. 11 11月, 2017 1 次提交
    • C
      drm/i915/selftests: Yet another forgotten mock_i915->mm initialiser · 9c52d1c8
      Chris Wilson 提交于
      Move all of the i915->mm initialisation to a private function that can
      be reused by the mock i915 device to save forgetting any more steps.
      
      For example,
      <7>[ 1542.046332] [IGT] drv_selftest: starting subtest mock_objects
      <4>[ 1542.123924] Setting dangerous option mock_selftests - tainting kernel
      <6>[ 1542.167941] i915: Performing mock selftests with st_random_seed=0x246f5ab5 st_timeout=1000
      <4>[ 1542.178012] INFO: trying to register non-static key.
      <4>[ 1542.178027] the code is fine but needs lockdep annotation.
      <4>[ 1542.178032] turning off the locking correctness validator.
      <4>[ 1542.178041] CPU: 3 PID: 6008 Comm: kworker/3:7 Tainted: G     U          4.14.0-rc8-CI-CI_DRM_3332+ #1
      <4>[ 1542.178049] Hardware name:                  /NUC6CAYB, BIOS AYAPLCEL.86A.0040.2017.0619.1722 06/19/2017
      <4>[ 1542.178144] Workqueue: events __i915_gem_free_work [i915]
      <4>[ 1542.178152] Call Trace:
      <4>[ 1542.178163]  dump_stack+0x68/0x9f
      <4>[ 1542.178170]  register_lock_class+0x3fd/0x580
      <4>[ 1542.178177]  ? unwind_next_frame+0x14/0x20
      <4>[ 1542.178184]  ? __save_stack_trace+0x73/0xd0
      <4>[ 1542.178191]  __lock_acquire+0xa4/0x1b00
      <4>[ 1542.178254]  ? __i915_gem_free_work+0x28/0xa0 [i915]
      <4>[ 1542.178261]  ? __lock_acquire+0x4ab/0x1b00
      <4>[ 1542.178268]  lock_acquire+0xb0/0x200
      <4>[ 1542.178273]  ? lock_acquire+0xb0/0x200
      <4>[ 1542.178336]  ? __i915_gem_free_work+0x28/0xa0 [i915]
      <4>[ 1542.178344]  _raw_spin_lock+0x32/0x50
      <4>[ 1542.178405]  ? __i915_gem_free_work+0x28/0xa0 [i915]
      <4>[ 1542.178468]  __i915_gem_free_work+0x28/0xa0 [i915]
      <4>[ 1542.178476]  process_one_work+0x221/0x650
      <4>[ 1542.178483]  worker_thread+0x4e/0x3c0
      <4>[ 1542.178489]  kthread+0x114/0x150
      <4>[ 1542.178494]  ? process_one_work+0x650/0x650
      <4>[ 1542.178499]  ? kthread_create_on_node+0x40/0x40
      <4>[ 1542.178506]  ret_from_fork+0x27/0x40
      
      v2: Fish out i915->mm.object_stat_lock which was being inited over in
      i915_drv.c (Matthew)
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.auld@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171110232447.21618-1-chris@chris-wilson.co.ukReviewed-by: NMatthew Auld <matthew.auld@intel.com>
      9c52d1c8