1. 11 6月, 2018 2 次提交
    • C
      drm/i915: Wrap around the tail offset before setting ring->tail · 41d37680
      Chris Wilson 提交于
      The HW only accepts offsets within ring->size, and fails peculiarly if
      the RING_HEAD or RING_TAIL is set to ring->size. Therefore whenever we
      set ring->head/ring->tail we want to make sure it is within value (using
      intel_ring_wrap()).
      
      v2: Double check execlists as well
      v3: Remove redundancy with assert_ring_tail_valid()
      v4: Just assert in intel_ring_reset() rather than be over-defensive.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v2
      Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-2-chris@chris-wilson.co.uk
      41d37680
    • C
      drm/i915/ringbuffer: Fix context restore upon reset · b3ee09a4
      Chris Wilson 提交于
      The discovery with trying to enable full-ppgtt was that we were
      completely failing to the load both the mm and context following the
      reset. Although we were performing mmio to set the PP_DIR (per-process
      GTT) and CCID (context), these were taking no effect (the assumption was
      that this would trigger reload of the context and restore the page
      tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
      following context switch would anything occur.
      
      Since we are then required to reset the context image and PP_DIR using
      CS commands, we place those commands into every batch. The hardware
      should recognise the no-ops and eliminate the expensive context loads,
      but we still have to pay the cost of using cross-powerwell register
      writes. In practice, this has no effect on actual context switch times,
      and only adds a few hundred nanoseconds to no-op switches. We can improve
      the latter by eliminating the w/a around known no-op switches, but there
      is an ulterior motive to keeping them.
      
      Always emitting the context switch at the beginning of the request (and
      relying on HW to skip unneeded switches) does have one key advantage.
      Should we implement request reordering on Haswell, we will not know in
      advance what the previous executing context was on the GPU and so we
      would not be able to elide the MI_SET_CONTEXT commands ourselves and
      always have to emit them. Having our hand forced now actually prepares
      us for later.
      
      Now since that context and mm follow the request, we no longer (and not
      for a long time since requests took over!) require a trace point to tell
      when we write the switch into the ring, since it is always. (This is
      even more important when you remember that simply writing into the ring
      bears no relation to the current mm.)
      
      v2: Sandybridge has to agree to use LRI as well.
      
      Testcase: igt/drv_selftests/live_hangcheck
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
      b3ee09a4
  2. 19 5月, 2018 1 次提交
  3. 18 5月, 2018 2 次提交
  4. 17 5月, 2018 2 次提交
  5. 03 5月, 2018 2 次提交
    • C
      drm/i915: Split i915_gem_timeline into individual timelines · a89d1f92
      Chris Wilson 提交于
      We need to move to a more flexible timeline that doesn't assume one
      fence context per engine, and so allow for a single timeline to be used
      across a combination of engines. This means that preallocating a fence
      context per engine is now a hindrance, and so we want to introduce the
      singular timeline. From the code perspective, this has the notable
      advantage of clearing up a lot of mirky semantics and some clumsy
      pointer chasing.
      
      By splitting the timeline up into a single entity rather than an array
      of per-engine timelines, we can realise the goal of the previous patch
      of tracking the timeline alongside the ring.
      
      v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
      uninitialised.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
      a89d1f92
    • C
      drm/i915: Move timeline from GTT to ring · 65fcb806
      Chris Wilson 提交于
      In the future, we want to move a request between engines. To achieve
      this, we first realise that we have two timelines in effect here. The
      first runs through the GTT is required for ordering vma access, which is
      tracked currently by engine. The second is implied by sequential
      execution of commands inside the ringbuffer. This timeline is one that
      maps to userspace's expectations when submitting requests (i.e. given the
      same context, batch A is executed before batch B). As the rings's
      timelines map to userspace and the GTT timeline an implementation
      detail, move the timeline from the GTT into the ring itself (per-context
      in logical-ring-contexts/execlists, or a global per-engine timeline for
      the shared ringbuffers in legacy submission.
      
      The two timelines are still assumed to be equivalent at the moment (no
      migrating requests between engines yet) and so we can simply move from
      one to the other without adding extra ordering.
      
      v2: Reinforce that one isn't allowed to mix the engine execution
      timeline with the client timeline from userspace (on the ring).
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
      65fcb806
  6. 30 4月, 2018 2 次提交
    • C
      drm/i915: Only track live rings for retiring · 643b450a
      Chris Wilson 提交于
      We don't need to track every ring for its lifetime as they are managed
      by the contexts/engines. What we do want to track are the live rings so
      that we can sporadically clean up requests if userspace falls behind. We
      can simply restrict the gt->rings list to being only gt->live_rings.
      
      v2: s/live/active/ for consistency with gt.active_requests
      Suggested-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
      643b450a
    • C
      drm/i915: Retire requests along rings · b887d615
      Chris Wilson 提交于
      In the next patch, rings are the central timeline as requests may jump
      between engines. Therefore in the future as we retire in order along the
      engine timeline, we may retire out-of-order within a ring (as the ring now
      occurs along multiple engines), leading to much hilarity in miscomputing
      the position of ring->head.
      
      As an added bonus, retiring along the ring reduces the penalty of having
      one execlists client do cleanup for another (old legacy submission
      shares a ring between all clients). The downside is that slow and
      irregular (off the critical path) process of cleaning up stale requests
      after userspace becomes a modicum less efficient.
      
      In the long run, it will become apparent that the ordered
      ring->request_list matches the ring->timeline, a fun challenge for the
      future will be unifying the two lists to avoid duplication!
      
      v2: We need both engine-order and ring-order processing to maintain our
      knowledge of where individual rings have completed upto as well as
      knowing what was last executing on any engine. And finally by decoupling
      retiring the contexts on the engine and the timelines along the rings,
      we do have to keep a reference to the context on each request
      (previously it was guaranteed by the context being pinned).
      
      v3: Not just a reference to the context, but we need to keep it pinned
      as we manipulate the rings; i.e. we need a pin for both the manipulation
      of the engine state during its retirements, and a separate pin for the
      manipulation of the ring state.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
      b887d615
  7. 26 4月, 2018 1 次提交
  8. 19 4月, 2018 1 次提交
  9. 12 4月, 2018 1 次提交
  10. 04 4月, 2018 1 次提交
  11. 03 4月, 2018 1 次提交
  12. 20 3月, 2018 1 次提交
  13. 15 3月, 2018 2 次提交
  14. 13 3月, 2018 1 次提交
  15. 10 3月, 2018 1 次提交
  16. 08 3月, 2018 1 次提交
    • L
      drm/i915: store all subslice masks · 8cc76693
      Lionel Landwerlin 提交于
      Up to now, subslice mask was assumed to be uniform across slices. But
      starting with Cannonlake, slices can be asymmetric (for example slice0
      has different number of subslices as slice1+). This change stores all
      subslices masks for all slices rather than having a single mask that
      applies to all slices.
      
      v2: Rework how we store total numbers in sseu_dev_info (Tvrtko)
          Fix CHV eu masks, was reading disabled as enabled (Tvrtko)
          Readability changes (Tvrtko)
          Add EU index helper (Tvrtko)
      
      v3: Turn ALIGN(v, 8) / 8 into DIV_ROUND_UP(v, BITS_PER_BYTE) (Tvrtko)
          Reuse sseu_eu_idx() for setting eu_mask on CHV (Tvrtko)
          Reformat debug prints for subslices (Tvrtko)
      
      v4: Change eu_mask helper into sseu_set_eus() (Tvrtko)
      
      v5: With Haswell reporting masks & counts, bump sseu_*_eus() functions
          to use u16 (Lionel)
      
      v6: Fix sseu_get_eus() for > 8 EUs per subslice (Lionel)
      
      v7: Change debugfs enabels for number of subslices per slice, will
          need a small igt/pm_sseu change (Lionel)
          Drop subslice_total field from sseu_dev_info, rely on
          sseu_subslice_total() to recompute the value instead (Lionel)
      
      v8: Remove unused function compute_subslice_total() (Lionel)
      Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-2-lionel.g.landwerlin@intel.com
      8cc76693
  17. 07 3月, 2018 2 次提交
  18. 06 3月, 2018 1 次提交
    • C
      drm/i915/breadcrumbs: Reduce signaler rbtree to a sorted list · cd46c545
      Chris Wilson 提交于
      The goal here is to try and reduce the latency of signaling additional
      requests following the wakeup from interrupt by reducing the list of
      to-be-signaled requests from an rbtree to a sorted linked list. The
      original choice of using an rbtree was to facilitate random insertions
      of request into the signaler while maintaining a sorted list. However,
      if we assume that most new requests are added when they are submitted,
      we see those new requests in execution order making a insertion sort
      fast, and the reduction in overhead of each signaler iteration
      significant.
      
      Since commit 56299fb7 ("drm/i915: Signal first fence from irq handler
      if complete"), we signal most fences directly from notify_ring() in the
      interrupt handler greatly reducing the amount of work that actually
      needs to be done by the signaler kthread. All the thread is then
      required to do is operate as the bottom-half, cleaning up after the
      interrupt handler and preparing the next waiter. This includes signaling
      all later completed fences in a saturated system, but on a mostly idle
      system we only have to rebuild the wait rbtree in time for the next
      interrupt. With this de-emphasis of the signaler's role, we want to
      rejig it's datastructures to reduce the amount of work we require to
      both setup the signal tree and maintain it on every interrupt.
      
      References: 56299fb7 ("drm/i915: Signal first fence from irq handler if complete")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180222092545.17216-1-chris@chris-wilson.co.uk
      cd46c545
  19. 01 3月, 2018 1 次提交
  20. 24 2月, 2018 1 次提交
  21. 22 2月, 2018 1 次提交
  22. 14 2月, 2018 1 次提交
  23. 12 2月, 2018 1 次提交
  24. 08 2月, 2018 1 次提交
  25. 06 2月, 2018 1 次提交
    • T
      drm/i915/pmu: Fix PMU enable vs execlists tasklet race · b2f78cda
      Tvrtko Ursulin 提交于
      Commit 99e48bf9 ("drm/i915: Lock out execlist tasklet while peeking
      inside for busy-stats") added a tasklet_disable call in busy stats
      enabling, but we failed to understand that the PMU enable callback runs
      as an hard IRQ (IPI).
      
      Consequence of this is that the PMU enable callback can interrupt the
      execlists tasklet, and will then deadlock when it calls
      intel_engine_stats_enable->tasklet_disable.
      
      To fix this, I realized it is possible to move the engine stats enablement
      and disablement to PMU event init and destroy hooks. This allows for much
      simpler implementation since those hooks run in normal context (can
      sleep).
      
      v2: Extract engine_event_destroy. (Chris Wilson)
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Fixes: 99e48bf9 ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats")
      Testcase: igt/perf_pmu/enable-race-*
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: intel-gfx@lists.freedesktop.org
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180205093448.13877-1-tvrtko.ursulin@linux.intel.com
      b2f78cda
  26. 09 12月, 2017 1 次提交
  27. 08 12月, 2017 1 次提交
  28. 29 11月, 2017 2 次提交
  29. 24 11月, 2017 1 次提交
  30. 23 11月, 2017 1 次提交
  31. 22 11月, 2017 2 次提交
    • T
      drm/i915/pmu: Wire up engine busy stats to PMU · b3add01e
      Tvrtko Ursulin 提交于
      We can use engine busy stats instead of the sampling timer for
      better accuracy.
      
      By doing this we replace the stohastic sampling with busyness
      metric derived directly from engine activity. This is context
      switch interrupt driven, so as accurate as we can get from
      software tracking.
      
      As a secondary benefit, we can also not run the sampling timer
      in cases only busyness metric is enabled.
      
      v2: Rebase.
      v3:
       * Rebase, comments.
       * Leave engine busyness controls out of workers.
      v4: Checkpatch cleanup.
      v5: Added comment to pmu_needs_timer change.
      v6:
       * Rebase.
       * Fix style of some comments. (Chris Wilson)
      v7: Rebase and commit message update. (Chris Wilson)
      v8: Add delayed stats disabling to improve accuracy in face of
          CPU hotplug events.
      v9: Rebase.
      v10: Rebase - i915_modparams.enable_execlists removal.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-6-tvrtko.ursulin@linux.intel.com
      b3add01e
    • T
      drm/i915: Engine busy time tracking · 30e17b78
      Tvrtko Ursulin 提交于
      Track total time requests have been executing on the hardware.
      
      We add new kernel API to allow software tracking of time GPU
      engines are spending executing requests.
      
      Both per-engine and global API is added with the latter also
      being exported for use by external users.
      
      v2:
       * Squashed with the internal API.
       * Dropped static key.
       * Made per-engine.
       * Store time in monotonic ktime.
      
      v3: Moved stats clearing to disable.
      
      v4:
       * Comments.
       * Don't export the API just yet.
      
      v5: Whitespace cleanup.
      
      v6:
       * Rename ref to active.
       * Drop engine aggregate stats for now.
       * Account initial busy period after enabling stats.
      
      v7:
       * Rebase.
      
      v8:
       * Move context in notification after the notifier. (Chris Wilson)
      
      v9:
      
      In cases where stats tracking is getting disabled while there is
      an active context on an engine, add up the current value to the
      total. This also implies we don't clear the total when tracking
      is disabled any longer. There is no real need to do so because
      we define the stats as relative while enabled, meaning
      comparison between two samples while tracking is enabled is the
      valid usage. However, when busy stats will later be plugged into
      the perf PMU API, it is beneficial to not reset the total, since
      the PMU core likes to do some counter disable/enable cycles on
      startup, and while doing so during a single long context
      executing on an engine we would lose some accuracy and so make
      unit testing more difficult than needs to be.
      
      v10:
       * Fix accounting for preemption.
      
      v11:
       * Rebase for i915_modparams.enable_execlists removal.
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-5-tvrtko.ursulin@linux.intel.com
      30e17b78