1. 27 6月, 2019 3 次提交
  2. 18 6月, 2019 1 次提交
  3. 17 6月, 2019 1 次提交
  4. 14 6月, 2019 2 次提交
  5. 12 6月, 2019 2 次提交
  6. 10 6月, 2019 1 次提交
  7. 28 5月, 2019 2 次提交
  8. 03 5月, 2019 1 次提交
  9. 30 4月, 2019 3 次提交
  10. 26 4月, 2019 1 次提交
  11. 17 4月, 2019 5 次提交
  12. 11 4月, 2019 1 次提交
  13. 10 4月, 2019 2 次提交
  14. 08 4月, 2019 1 次提交
  15. 03 4月, 2019 1 次提交
  16. 25 3月, 2019 2 次提交
  17. 21 3月, 2019 1 次提交
  18. 20 3月, 2019 1 次提交
  19. 15 3月, 2019 1 次提交
  20. 14 3月, 2019 1 次提交
  21. 06 3月, 2019 1 次提交
  22. 05 3月, 2019 1 次提交
  23. 02 3月, 2019 1 次提交
  24. 21 2月, 2019 1 次提交
    • C
      drm/i915: Reduce the RPS shock · 2a8862d2
      Chris Wilson 提交于
      Limit deboosting and boosting to keep ourselves at the extremes
      when in the respective power modes (i.e. slowly decrease frequencies
      while in the HIGH_POWER zone and slowly increase frequencies while
      in the LOW_POWER zone). On idle, we will hit the timeout and drop
      to the next level quickly, and conversely if busy we expect to
      hit a waitboost and rapidly switch into max power.
      
      This should improve the UX experience by keeping the GPU clocks higher
      than they ostensibly should be (based on simple busyness) by switching
      into the INTERACTIVE mode (due to waiting for pageflips) and increasing
      clocks via waitboosting. This will incur some additional power, our
      saving grace should be rc6 and powergating to keep the extra current
      draw in check.
      
      Food for future thought would be deadline scheduling? If we know certain
      contexts (high priority compositors) absolutely must hit the next vblank
      then we can raise the frequencies ahead of time. Part of this is covered
      by per-context frequencies, where userspace is given control over the
      frequency range they want the GPU to execute at (for largely the same
      problem as this, where the workload is very latency sensitive but at the
      EI level appears mostly idle). Indeed, the per-context series does
      extend the modeset boosting to include a frequency range tweak which
      seems applicable to solving this jittery UX behaviour.
      Reported-by: NLyude Paul <lyude@redhat.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109408
      References: 0d55babc ("drm/i915: Drop stray clearing of rps->last_adj")
      References: 60548c55 ("drm/i915: Interactive RPS mode")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Eero Tamminen <eero.t.tamminen@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Michel Thierry <michel.thierry@intel.com>
      
      Quoting Lyude Paul:
      > Before reverting 0d55babc: [4.20]
      >
      > 35 measurements [of gnome-shell animations]
      > Average: 33.65657142857143 FPS
      > FPS observed: 20.8 - 46.87 FPS
      > Percentage under 60 FPS: 100.0%
      > Percentage under 55 FPS: 100.0%
      > Percentage under 50 FPS: 100.0%
      > Percentage under 45 FPS: 97.14285714285714%
      > Percentage under 40 FPS: 97.14285714285714%
      > Percentage under 35 FPS: 45.714285714285715%
      > Percentage under 30 FPS: 11.428571428571429%
      > Percentage under 25 FPS: 2.857142857142857%
      >
      > After reverting: [4.19 behaviour]
      >
      > 30 measurements
      > Average: 49.833666666666666 FPS
      > FPS observed: 33.85 - 60.0 FPS
      > Percentage under 60 FPS: 86.66666666666667%
      > Percentage under 55 FPS: 70.0%
      > Percentage under 50 FPS: 53.333333333333336%
      > Percentage under 45 FPS: 20.0%
      > Percentage under 40 FPS: 6.666666666666667%
      > Percentage under 35 FPS: 6.666666666666667%
      > Percentage under 30 FPS: 0%
      > Percentage under 25 FPS: 0%
      >
      > Patched:
      > 42 measurements
      > Average: 46.05428571428571 FPS
      > FPS observed: 1.82 - 59.98 FPS
      > Percentage under 60 FPS: 88.09523809523809%
      > Percentage under 55 FPS: 61.904761904761905%
      > Percentage under 50 FPS: 45.23809523809524%
      > Percentage under 45 FPS: 35.714285714285715%
      > Percentage under 40 FPS: 33.33333333333333%
      > Percentage under 35 FPS: 19.047619047619047%
      > Percentage under 30 FPS: 7.142857142857142%
      > Percentage under 25 FPS: 4.761904761904762%
      Tested-by: NLyude Paul <lyude@redhat.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190219122215.8941-13-chris@chris-wilson.co.uk
      2a8862d2
  25. 16 2月, 2019 1 次提交
  26. 30 1月, 2019 2 次提交
    • C
      drm/i915: Replace global breadcrumbs with per-context interrupt tracking · 52c0fdb2
      Chris Wilson 提交于
      A few years ago, see commit 688e6c72 ("drm/i915: Slaughter the
      thundering i915_wait_request herd"), the issue of handling multiple
      clients waiting in parallel was brought to our attention. The
      requirement was that every client should be woken immediately upon its
      request being signaled, without incurring any cpu overhead.
      
      To handle certain fragility of our hw meant that we could not do a
      simple check inside the irq handler (some generations required almost
      unbounded delays before we could be sure of seqno coherency) and so
      request completion checking required delegation.
      
      Before commit 688e6c72, the solution was simple. Every client
      waiting on a request would be woken on every interrupt and each would do
      a heavyweight check to see if their request was complete. Commit
      688e6c72 introduced an rbtree so that only the earliest waiter on
      the global timeline would woken, and would wake the next and so on.
      (Along with various complications to handle requests being reordered
      along the global timeline, and also a requirement for kthread to provide
      a delegate for fence signaling that had no process context.)
      
      The global rbtree depends on knowing the execution timeline (and global
      seqno). Without knowing that order, we must instead check all contexts
      queued to the HW to see which may have advanced. We trim that list by
      only checking queued contexts that are being waited on, but still we
      keep a list of all active contexts and their active signalers that we
      inspect from inside the irq handler. By moving the waiters onto the fence
      signal list, we can combine the client wakeup with the dma_fence
      signaling (a dramatic reduction in complexity, but does require the HW
      being coherent, the seqno must be visible from the cpu before the
      interrupt is raised - we keep a timer backup just in case).
      
      Having previously fixed all the issues with irq-seqno serialisation (by
      inserting delays onto the GPU after each request instead of random delays
      on the CPU after each interrupt), we can rely on the seqno state to
      perfom direct wakeups from the interrupt handler. This allows us to
      preserve our single context switch behaviour of the current routine,
      with the only downside that we lose the RT priority sorting of wakeups.
      In general, direct wakeup latency of multiple clients is about the same
      (about 10% better in most cases) with a reduction in total CPU time spent
      in the waiter (about 20-50% depending on gen). Average herd behaviour is
      improved, but at the cost of not delegating wakeups on task_prio.
      
      v2: Capture fence signaling state for error state and add comments to
      warm even the most cold of hearts.
      v3: Check if the request is still active before busywaiting
      v4: Reduce the amount of pointer misdirection with list_for_each_safe
      and using a local i915_request variable inside the loops
      v5: Add a missing pluralisation to a purely informative selftest message.
      
      References: 688e6c72 ("drm/i915: Slaughter the thundering i915_wait_request herd")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
      52c0fdb2
    • C
      drm/i915: Remove the intel_engine_notify tracepoint · 3df0bd19
      Chris Wilson 提交于
      The global seqno is defunct and so we have no meaningful indicator of
      forward progress for an engine. You need to listen to the request
      signaling tracepoints instead.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-1-chris@chris-wilson.co.uk
      3df0bd19