提交 · e7af3116836fb7feb985497f2c7776751fb27ef3 · openanolis / cloud-kernel

05 10月, 2017 1 次提交

drm/i915/execlists: Distinguish the incomplete context notifies · d6c05113

由 Chris Wilson 提交于 10月 03, 2017

Let the listener know that the context we just scheduled out was not
complete, and will be scheduled back in at a later point.

v2: Handle CONTEXT_STATUS_PREEMPTED in gvt by aliasing it to
CONTEXT_STATUS_OUT for the moment, gvt can expand upon the difference
later.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171003203453.15692-3-chris@chris-wilson.co.uk

d6c05113

04 10月, 2017 1 次提交

drm/i915: Remove use_mmio_flip modparm, v2. · 8279aaf5

由 Maarten Lankhorst 提交于 10月 04, 2017

This has been unused since commit afa8ce5b
("drm/i915: Nuke legacy flip queueing code").

Changes since v1:
- Rebase on top of all the changes to modparams.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
\o/-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171004094416.31306-1-maarten.lankhorst@linux.intel.com

8279aaf5

29 9月, 2017 3 次提交

drm/i915/execlists: Cache the last priolist lookup · 097a9481

由 Michał Winiarski 提交于 9月 28, 2017

Avoid the repeated rbtree lookup for each request as we unwind them by
tracking the last priolist.

v2: Fix up my unhelpful suggestion of using default_priolist.
Signed-off-by: NMichał Winiarski <michal.winiarski@intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-4-chris@chris-wilson.co.uk

097a9481

drm/i915: Give the invalid priority a magic name · 7d1ea609

由 Chris Wilson 提交于 9月 28, 2017

We use INT_MIN to denote the priority of a request that has not been
submitted to the scheduler; we treat INT_MIN as an invalid priority and
initialise the request to it. Give the value a name so it stands out.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-3-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com

7d1ea609

drm/i915/execlists: Move request unwinding to a separate function · 7e4992ac

由 Chris Wilson 提交于 9月 28, 2017

In the future, we will want to unwind requests following a preemption
point. This requires the same steps as for unwinding upon a reset, so
extract the existing code to a separate function for later use.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170928193910.17988-2-chris@chris-wilson.co.uk

7e4992ac

27 9月, 2017 1 次提交

drm/i915/execlists: Notify context-out for lost requests · 7e44fc28

由 Chris Wilson 提交于 9月 26, 2017

When cancelling requests, also send the notification to any listeners
(gvt) that the request is no longer scheduled on hw. They may require to
keep the in/out exactly balanced, and so the reuse after the reset may
confuse the listener.

Fixes: 221ab971 ("drm/i915/execlists: Unwind incomplete requests on resets")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: "Zhenyu Wang" <zhenyuw@linux.intel.com>
Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170926101720.9479-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>

7e44fc28

26 9月, 2017 1 次提交

drm/i915/execlists: Microoptimise execlists_cancel_port_request() · 3f9e6cd8

由 Chris Wilson 提交于 9月 25, 2017

Just rearrange the code slightly to trim the number of iterations
required.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170925124929.16974-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>

3f9e6cd8

25 9月, 2017 7 次提交

drm/i915/lrc: Skip no-op per-bb buffer on gen9 · b8aa2233

由 Chris Wilson 提交于 9月 21, 2017

Since we inherited the context image setup from gen8 which needed a
per-bb workaround (for GPGPU), we are submitting an empty per-bb buffer
on gen9. Now that we can skip adding the buffer to the context image,
remove the dangling per-bb. This slightly improves execution latency,
most notably on an idle engine.

References: https://bugs.freedesktop.org/show_bug.cgi?id=87725Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-2-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>

b8aa2233

drm/i915/lrc: Only enable per-context and per-bb buffers if set · 604a8f6f

由 Chris Wilson 提交于 9月 21, 2017

The per-context and per-batch workaround buffers are optional, yet we
tell the GPU to execute them even if they contain no instructions. Doing
so incurs the dispatch latency, which we can avoid if we don't ask the
GPU to execute the no-op buffers. Allow ourselves to skip setup of empty
buffer, and then to only enable non-empty buffers in the context image.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170921135444.27330-1-chris@chris-wilson.co.ukReviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>

604a8f6f

drm/i915: Make execlist port count variable · 76e70087

由 Mika Kuoppala 提交于 9月 22, 2017

As we emulate execlists on top of the GuC workqueue, it is not
restricted to just 2 ports and we can increase that number arbitrarily
to trade-off queue depth (i.e. scheduling latency) against pipeline
bubbles.

v2: rebase. better commit msg (Chris)
v3: rebase
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-5-mika.kuoppala@intel.com

76e70087

drm/i915: Add execlist_port_complete · 7a62cc61

由 Mika Kuoppala 提交于 9月 22, 2017

When first execlist entry is processed, we move the port (contents).
Introduce function for this as execlist and guc use this common
operation.

v2: rebase. s/GEM_DEBUG_BUG/GEM_BUG (Chris)
v3: rebase
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-4-mika.kuoppala@intel.com

7a62cc61

drm/i915: Wrap port cancellation into a function · cf4591d1

由 Mika Kuoppala 提交于 9月 22, 2017

On reset and wedged path, we want to release the requests
that are tied to ports and then mark the ports to be unset.
Introduce a function for this.

v2: rebase
v3: drop local, keep GEM_BUG_ON (Michał, Chris)
v4: rebase

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-3-mika.kuoppala@intel.com

cf4591d1

drm/i915: Move execlist initialization into intel_engine_cs.c · 19df9a57

由 Mika Kuoppala 提交于 9月 22, 2017

Move execlist init into a common engine setup. As it is
common to both guc and hw execlists.

v2: rebase with csb changes
v3: rebase
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-2-mika.kuoppala@intel.com

19df9a57

drm/i915: Make own struct for execlist items · b620e870

由 Mika Kuoppala 提交于 9月 22, 2017

Engine's execlist related items have been increasing to
a point where a separate struct is warranted. Carve execlist
specific items to a dedicated struct to add clarity.

v2: add kerneldoc and fix whitespace (Joonas, Chris)
v3: csb_mmio changes, rebase
v4: s/\b(el|execlist)\b/execlists/ (Joonas)
Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> (v3)
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170922124307.10914-1-mika.kuoppala@intel.com

b620e870

22 9月, 2017 1 次提交

drm/i915: Rename global i915 to i915_modparams · 4f044a88

由 Michal Wajdeczko 提交于 9月 19, 2017

Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
-	i915.n
+	i915_modparams.n
)
Signed-off-by: NMichal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com

4f044a88

18 9月, 2017 6 次提交

drm/i915/guc: Submit GuC workitems containing coalesced requests · 85e2fe67

由 Michał Winiarski 提交于 9月 14, 2017

To create an upper bound on number of GuC workitems, we need to change
the way that requests are being submitted. Rather than submitting each
request as an individual workitem, we can do coalescing in a similar way
we're handlig execlist submission ports. We also need to stop pretending
that we're doing "lite-restore" in GuC submission (we would create a
workitem each time we hit this condition). This allows us to completely
remove the reservation, replacing it with a compile time check.

v2: Also coalesce when replaying on reset (Daniele)
v3: Consistent wq_resv - per-request (Daniele)
v4: Squash removing wq_resv
v5: Reflect i915_guc_submit argument changes in doc
v6: Rebase on top of execlists reset/restart fix (Chris,Michał)

References: https://bugs.freedesktop.org/show_bug.cgi?id=101873
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: NMichał Winiarski <michal.winiarski@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170914083216.10192-2-michal.winiarski@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>

85e2fe67

drm/i915/execlists: Unwind incomplete requests on resets · 221ab971

由 Chris Wilson 提交于 9月 16, 2017

Given the mechanism to unwind and replay requests (designed to support
preemption), we have an alternative to the current method of
resubmitting the ELSP upon reset. Resubmitting ELSP turns out to be more
complicated than expected, due to having to handle lost context-switch
interrupts and so guessing what ELSP we need to resubmit later. Instead,
by unwinding the requests and clearing the ELSP tracking entirely, we
can then just dequeue the first pair of ready requests after resetting,
using the normal submission procedure.

Currently, the unwound requests have maximum priority and so are
guaranteed to be resubmitted upon resume. If we are lucky, we may be
able to coalesce a new request on top!
Suggested-by: NMichał Winiarski <michal.winiarski@intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-4-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com>

221ab971

drm/i915/execlists: Split insert_request() · 27606fd8

由 Chris Wilson 提交于 9月 16, 2017

In the next patch we will want to reinsert a request not at the end of
the priority queue, but at the front. Here we split insert_request()
into two, the first function retrieves the priority list (for reuse for
unsubmit later) and a wrapper function to insert at the end of that list
and to schedule the tasklet if we were first.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-3-chris@chris-wilson.co.uk

27606fd8

drm/i915/execlists: Move insert_request() · 08dd3e1a

由 Chris Wilson 提交于 9月 16, 2017

Move insert_request() earlier to avoid a forward declaration in a later
patch.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-2-chris@chris-wilson.co.uk

08dd3e1a

drm/i915/execlists: Kick start request processing after a reset · 523e7c92

由 Chris Wilson 提交于 9月 16, 2017

During a reset, we may skip over completed requests and lost
context-switch interrupts. Following the reset, we may then may end up
with no active requests in the ELSP (and so do not resubmit to restart
the engine), but have a queue of requests ready for execution. This is
unlikely, it requires the last request to complete after the hang is
detected, but not impossible. The outcome of this is that the engine
stalls, possibly leading to full ring and indefinite wait under
struct_mutex, eventually leading to a full driver hang.

Alternatively, we can solve this by unsubmitting the incomplete requests
and just kickstarting the tasklet. Michał has patches for that, which I
initially disliked due to the extra complexity, but the complexity of
this "simple" restart is growing...
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170916204414.32762-1-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com>

523e7c92

drm/i915: Cancel all ready but queued requests when wedging · 27a5f61b

由 Chris Wilson 提交于 9月 15, 2017

When wedging the hw, we want to mark all in-flight requests as -EIO.
This is made slightly more complex by execlists who store the ready but
not yet submitted-to-hw requests on a private queue (an rbtree
priolist). Call into execlists to cancel not only the ELSP tracking for
the submitted requests, but also the queue of unsubmitted requests.

v2: Move the majority of engine_set_wedged to the backends (both legacy
ringbuffer and execlists handling their own lists).
Reported-by: NMichał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_eio/in-flight-contexts
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.ukReviewed-by: NMichał Winiarski <michal.winiarski@intel.com>

27a5f61b

14 9月, 2017 2 次提交

drm/i915/execlists: Read the context-status HEAD from the HWSP · 767a983a

由 Chris Wilson 提交于 9月 13, 2017

The engine also provides a mirror of the CSB write pointer in the HWSP,
but not of our read pointer. To take advantage of this we need to
remember where we read up to on the last interrupt and continue off from
there. This poses a problem following a reset, as we don't know where
the hw will start writing from, and due to the use of power contexts we
cannot perform that query during the reset itself. So we continue the
current modus operandi of delaying the first read of the context-status
read/write pointers until after the first interrupt. With this we should
now have eliminated all uncached mmio reads in handling the
context-status interrupt, though we still have the uncached mmio writes
for submitting new work, and many uncached mmio reads in the global
interrupt handler itself. Still a step in the right direction towards
reducing our resubmit latency, although it appears lost in the noise!

v2: Cannonlake moved the CSB write index
v3: Include the sw/hwsp state in debugfs/i915_engine_info
v4: Also revert to using CSB mmio for GVT-g
v5: Prevent the compiler reloading tail (Mika)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: NMichel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-6-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>

767a983a

drm/i915/execlists: Read the context-status buffer from the HWSP · 6d2cb5aa

由 Chris Wilson 提交于 9月 13, 2017

The engine provides a mirror of the CSB in the HWSP. If we use the
cacheable reads from the HWSP, we can shave off a few mmio reads per
context-switch interrupt (which are quite frequent!). Just removing a
couple of mmio is not enough to actually reduce any latency, but a small
reduction in overall cpu usage.

Much appreciation for Ben dropping the bombshell that the CSB was in the
HWSP and for Michel in digging out the details.

v2: Don't be lazy, add the defines for the indices.
v3: Include the HWSP in debugfs/i915_engine_info
v4: Check for GVT-g, it currently depends on intercepting CSB mmio
v5: Fixup GVT-g mmio path
v6: Disable HWSP if VT-d is active as the iommu adds unpredictable
memory latency. (Mika)
v7: Also markup the CSB read with READ_ONCE() as it may still be an mmio
read and we want to stop the compiler from issuing a later (v.slow) reload.
Suggested-by: NBen Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Acked-by: NMichel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913133534.26927-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>

6d2cb5aa

13 9月, 2017 3 次提交

drm/i915/lrc: allocate separate page for HWSP · 486e93f7

由 Daniele Ceraolo Spurio 提交于 9月 13, 2017

On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.

To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.

v2: Add a use-case for the register arena of the HWSP.
Signed-off-by: NDaniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.comSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk

486e93f7

drm/i915/lrc: Clarify the format of the context image · 0b29c75a

由 Michel Thierry 提交于 9月 13, 2017

Not only the context image consist of two parts (the PPHWSP, and the
logical context state), but we also allocate a header at the start of
for sharing data with GuC. Thus every lrc looks like this:

  | [guc]          | [hwsp] [logical state] |
  |<- our header ->|<- context image      ->|

So far, we have oversimplified whenever we use each of these parts of the
context, just because the GuC header happens to be in page 0, and the
(PP)HWSP is in page 1. But this had led to using the same define for more
than one meaning (as a page index in the lrc and as 1 page).

This patch adds defines for the GuC shared page, the PPHWSP page and the
start of the logical state. It also updated the places where the old
define was being used. Since we are not changing the size (or format) of
the context, there are no functional changes.

v2: Use PPHWSP index for hws again.
Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Link: http://patchwork.freedesktop.org/patch/msgid/20170712193032.27080-1-michel.thierry@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-1-chris@chris-wilson.co.uk

0b29c75a

drm/i915: Move the context descriptor to an inline helper · 2013ddeb

由 Chris Wilson 提交于 9月 12, 2017

The context descriptor is stored inside the per-engine context state, as
we only need to compute it once and access it frequently. However,
currently only intel_lrc.c has easy access, but i915_guc_submission.c
would like to frequently read it as well, and more so only ever needs
the lower 32bits. Make it an inline as the compiler should be able to
retrieve the value in less instructions than it takes to do the function
call:

add/remove: 0/1 grow/shrink: 1/0 up/down: 8/-45 (-37)
function old new delta
i915_guc_submit 621 629 +8
intel_lr_context_descriptor 45 - -45
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20170912214905.21987-1-chris@chris-wilson.co.ukReviewed-by: NOscar Mateo <oscar.mateo@intel.com>

2013ddeb

21 8月, 2017 1 次提交

drm/i915: Clear lost context-switch interrupts across reset · d41a3c2b

由 Chris Wilson 提交于 8月 07, 2017

During a global reset, we disable the irq. As we disable the irq, the
hardware may be raising a GT interrupt that we then ignore, leaving it
pending in the GTIIR. After the reset, we then re-enable the irq,
triggering the pending interrupt. However, that interrupt was for the
stale state from before the reset, and the contents of the CSB buffer
are now invalid.

v2: Add a comment to make it clear that the double clear is purely my
paranoia.
Reported-by: N"Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Fixes: 821ed7df ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: "Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170807121919.30165-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20170818090509.5363-1-chris@chris-wilson.co.ukReviewed-by: NMichel Thierry <michel.thierry@intel.com>
(cherry picked from commit 64f09f00)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

d41a3c2b

19 8月, 2017 2 次提交

drm/i915/cnl: Introduce initial Cannonlake Workarounds. · 90007bca

由 Rodrigo Vivi 提交于 8月 15, 2017

Let's inherit workarounds from previous platforms that
according to wa_database and BSpec are still valid for
Cannonlake.

v2: Add missed workarounds.
v3: Rebase
v4: Remove bad chunk that was added to rc6 disable. (Ander)
    Also remove A0 W/a that are not needed anymore.
v5: Rebase on top of CFL.
v6: Remove empty gen9_init_perctx_bb and gen9_init_indirectctx_bb
    since they don't carry any gen10 related W/a. (by Oscar).
    Also Remove A0 exclusive workaround.
v7: Remove more A0 exclusive workarounds. As pointed out by Oscar
    many workarounds were changed to be A0 only so let's remove
    them.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: NOscar Mateo <oscar.mateo@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170815231651.975-1-rodrigo.vivi@intel.com

90007bca

drm/i915: Clear lost context-switch interrupts across reset · 64f09f00

由 Chris Wilson 提交于 8月 07, 2017

During a global reset, we disable the irq. As we disable the irq, the
hardware may be raising a GT interrupt that we then ignore, leaving it
pending in the GTIIR. After the reset, we then re-enable the irq,
triggering the pending interrupt. However, that interrupt was for the
stale state from before the reset, and the contents of the CSB buffer
are now invalid.

v2: Add a comment to make it clear that the double clear is purely my
paranoia.
Reported-by: N"Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Fixes: 821ed7df ("drm/i915: Update reset path to fix incomplete requests")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: "Dong, Chuanxiao" <chuanxiao.dong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170807121919.30165-1-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20170818090509.5363-1-chris@chris-wilson.co.ukReviewed-by: NMichel Thierry <michel.thierry@intel.com>

64f09f00

27 7月, 2017 1 次提交

drm/i915: Flush the execlist ports if idle · cdb6ded4

由 Chris Wilson 提交于 7月 21, 2017

When doing a GPU reset, the CSB register will be trashed and we will
lose any context-switch notifications that happened since the tasklet
was disabled. If we find that all requests on this engine were
completed, we want to make sure that the ELSP tracker is similarly empty
so that we do not feed back in the completed requests upon recovering
from the reset.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721123238.16428-4-chris@chris-wilson.co.ukSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cdb6ded4

21 6月, 2017 1 次提交

drm/i915: Group all the global context information together · 829a0af2

由 Chris Wilson 提交于 6月 20, 2017

Create a substruct to hold all the global context state under
drm_i915_private.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170620110547.15947-1-chris@chris-wilson.co.uk

829a0af2

19 6月, 2017 1 次提交

drm/i915: Differentiate between sw write location into ring and last hw read · a21ef715

由 Chris Wilson 提交于 6月 15, 2017

We need to keep track of the last location we ask the hw to read up to
(RING_TAIL) separately from our last write location into the ring, so
that in the event of a GPU reset we do not tell the HW to proceed into
a partially written request (which can happen if that request is waiting
for an external signal before being executed).

v2: Refactor intel_ring_reset() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Testcase: igt/gem_exec_fence/await-hang
Fixes: 821ed7df ("drm/i915: Update reset path to fix incomplete requests")
Fixes: d55ac5bf ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
(cherry picked from commit e6ba9992)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615131129.3061-1-chris@chris-wilson.co.uk

a21ef715

15 6月, 2017 1 次提交

drm/i915/perf: Add OA unit support for Gen 8+ · 19f81df2

由 Robert Bragg 提交于 6月 13, 2017

Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.

Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.

The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).

This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.

The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).

Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.

v5: Drain submitted requests when enabling metric set to ensure no
    lite-restore erases the context image we just updated (Lionel)

v6: In addition to drain, switch to kernel context & update all
    context in place (Chris)

v7: Add missing mutex_unlock() if switching to kernel context fails
    (Matthew)

v8: Simplify OA period/flex-eu-counters programming by using the
    batchbuffer instead of modifying ctx-image (Lionel)

v9: Back to updating the context image (due to erroneous testing,
    batchbuffer programming the OA unit doesn't actually work)
    (Lionel)
    Pin context before updating context image (Chris)
    Drop MMIO programming now that we switch to a kernel context with
    right values in initial context image (Chris)

v10: Just pin_map the contexts we want to modify or let the
     configuration happen on first use (Chris)

v11: Update kernel context OA config through the batchbuffer rather
     than on the fly ctx-image update (Lionel)

v12: Rework OA context registers update again by swithing away from
     user contexts and reconfiguring the kernel context through the
     batchbuffer and updating all the other contexts' context image.
     Also take care to lock slice/subslice configuration when OA is
     on. (Lionel)

v13: Request rpcs updates on all engine when updating the OA config
     (Lionel)

v14: Drop any kind of rpcs management now that we monitor sseu
     configuration changes in a later patch (Lionel)
     Remove usleep after programming the NOA configs on Gen8+, this
     doesn't seem to be needed (Lionel)

v15: Respect coding style for block comments (Chris)

v16: Add missing i915_add_request() in case we fail to emit OA
     configuration (Matthew)
Signed-off-by: NRobert Bragg <robert@sixbynine.org>
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

19f81df2

07 6月, 2017 1 次提交

drm/i915/gen10: Set value of Indirect Context Offset for gen10 · 7bd0a2c6

由 Michel Thierry 提交于 6月 06, 2017

Indirect Context Offset Pointer has changed for Cannonlake.

INDIRECT_CTX_OFFSET[15:6] valid value for CNL is 19h per Spec.

v2: rebased to intel_lr_indirect_ctx_offset

v3: Commit message added per Tvrtko request.
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1496781040-20888-9-git-send-email-rodrigo.vivi@intel.com

7bd0a2c6

19 5月, 2017 1 次提交

drm/i915: set initialised only when init_context callback is NULL · 46cd902f

由 Chuanxiao Dong 提交于 5月 11, 2017

During execlist_context_deferred_alloc() we presumed that the context is
uninitialised (we only just allocated the state object for it!) and
chose to optimise away the later call to engine->init_context() if
engine->init_context were NULL. This breaks with GVT's contexts that are
marked as pre-initialised to avoid us annoyingly calling
engine->init_context(). The fix is to not override ce->initialised if it
is already true.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChuanxiao Dong <chuanxiao.dong@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1494497262-24855-1-git-send-email-chuanxiao.dong@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 0d402a24)
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

46cd902f

17 5月, 2017 4 次提交

drm/i915/execlists: Reduce lock contention between schedule/submit_request · 349bdb68

由 Chris Wilson 提交于 5月 17, 2017

If we do not require to perform priority bumping, and we haven't yet
submitted the request, we can update its priority in situ and skip
acquiring the engine locks -- thus avoiding any contention between us
and submit/execute.

v2: Remove the stack element from the list if we can do the early
assignment.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-10-chris@chris-wilson.co.uk

349bdb68

drm/i915: Create a kmem_cache to allocate struct i915_priolist from · c5cf9a91

由 Chris Wilson 提交于 5月 17, 2017

The i915_priolist are allocated within an atomic context on a path where
we wish to minimise latency. If we use a dedicated kmem_cache, we have
the advantage of a local freelist from which to service new requests
that should keep the latency impact of an allocation small. Though
currently we expect the majority of requests to be at default priority
(and so hit the preallocate priolist), once userspace starts using
priorities they are likely to use many fine grained policies improving
the utilisation of a private slab.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-9-chris@chris-wilson.co.uk

c5cf9a91

drm/i915: Split execlist priority queue into rbtree + linked list · 6c067579

由 Chris Wilson 提交于 5月 17, 2017

All the requests at the same priority are executed in FIFO order. They
do not need to be stored in the rbtree themselves, as they are a simple
list within a level. If we move the requests at one priority into a list,
we can then reduce the rbtree to the set of priorities. This should keep
the height of the rbtree small, as the number of active priorities can not
exceed the number of active requests and should be typically only a few.

Currently, we have ~2k possible different priority levels, that may
increase to allow even more fine grained selection. Allocating those in
advance seems a waste (and may be impossible), so we opt for allocating
upon first use, and freeing after its requests are depleted. To avoid
the possibility of an allocation failure causing us to lose a request,
we preallocate the default priority (0) and bump any request to that
priority if we fail to allocate it the appropriate plist. Having a
request (that is ready to run, so not leading to corruption) execute
out-of-order is better than leaking the request (and its dependency
tree) entirely.

There should be a benefit to reducing execlists_dequeue() to principally
using a simple list (and reducing the frequency of both rbtree iteration
and balancing on erase) but for typical workloads, request coalescing
should be small enough that we don't notice any change. The main gain is
from improving PI calls to schedule, and the explicit list within a
level should make request unwinding simpler (we just need to insert at
the head of the list rather than the tail and not have to make the
rbtree search more complicated).

v2: Avoid use-after-free when deleting a depleted priolist

v3: Michał found the solution to handling the allocation failure
gracefully. If we disable all priority scheduling following the
allocation failure, those requests will be executed in fifo and we will
ensure that this request and its dependencies are in strict fifo (even
when it doesn't realise it is only a single list). Normal scheduling is
restored once we know the device is idle, until the next failure!
Suggested-by: NMichał Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-8-chris@chris-wilson.co.uk

6c067579

drm/i915/execlists: Pack the count into the low bits of the port.request · 77f0d0e9

由 Chris Wilson 提交于 5月 17, 2017

add/remove: 1/1 grow/shrink: 5/4 up/down: 391/-578 (-187)
function                                     old     new   delta
execlists_submit_ports                       262     471    +209
port_assign.isra                               -     136    +136
capture                                     6344    6359     +15
reset_common_ring                            438     452     +14
execlists_submit_request                     228     238     +10
gen8_init_common_ring                        334     341      +7
intel_engine_is_idle                         106     105      -1
i915_engine_info                            2314    2290     -24
__i915_gem_set_wedged_BKL                    485     411     -74
intel_lrc_irq_handler                       1789    1604    -185
execlists_update_context                     294       -    -294

The most important change there is the improve to the
intel_lrc_irq_handler and excclist_submit_ports (net improvement since
execlists_update_context is now inlined).

v2: Use the port_api() for guc as well (even though currently we do not
pack any counters in there, yet) and hide all port->request_count inside
the helpers.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170517121007.27224-5-chris@chris-wilson.co.uk

77f0d0e9

12 5月, 2017 1 次提交

drm/i915: set initialised only when init_context callback is NULL · 0d402a24

由 Chuanxiao Dong 提交于 5月 11, 2017

During execlist_context_deferred_alloc() we presumed that the context is
uninitialised (we only just allocated the state object for it!) and
chose to optimise away the later call to engine->init_context() if
engine->init_context were NULL. This breaks with GVT's contexts that are
marked as pre-initialised to avoid us annoyingly calling
engine->init_context(). The fix is to not override ce->initialised if it
is already true.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChuanxiao Dong <chuanxiao.dong@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1494497262-24855-1-git-send-email-chuanxiao.dong@intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

0d402a24

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功