提交 · e7778be1eab918274f79603d7c17b3ec8be77386 · openeuler / raspberrypi-kernel

03 12月, 2014 14 次提交

drm/i915: Fix startup failure in LRC mode after recent init changes · e7778be1

由 Thomas Daniel 提交于 12月 02, 2014

A previous commit introduced engine init changes:

    commit 372ee59699d9 ("drm/i915: Only init engines once")

This broke execlists as intel_lr_context_render_state_init was trying to emit
commands to the RCS for the default context before the ring->init_hw was called.

Made a new gen8_init_rcs_context function and assign in to render ring
init_context.  Moved call to intel_logical_ring_workarounds_emit into
gen8_init_rcs_context to maintain previous functionality.

Moved call to render_state_init from lr_context_deferred_create into
gen8_init_rcs_context, and modified deferred_create to call ring->init_context
for non-default contexts.

Modified i915_gem_context_enable to call ring->init_context for the default
context.

So init_context will now always be called when the hw is ready - in
i915_gem_context_enable for the default context and in lr_context_deferred_create
for other contexts.
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e7778be1

drm/i915: Flatten engine init control flow · bfc882b4

由 Daniel Vetter 提交于 11月 20, 2014

Now that sanity prevails and we have the clean split between software
init and starting the engines we can drop all the "have we allocate
this struct already?" nonsense.

Execlist code could benefit quite a bit more still, but that's for
another patch.
Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

bfc882b4

drm/i915: Only init engines once · 35a57ffb

由 Daniel Vetter 提交于 11月 20, 2014

We can do this.

And now there's finally the clean split between software setup and
hardware setup I kinda wanted since multi-ring support was merged
aeons ago. It only took almost 5 years.
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

35a57ffb

drm/i915: Move intel_init_pipe_control out of engine->init_hw · 99be1dfe

由 Daniel Vetter 提交于 11月 20, 2014

With this all the ->init_hw hooks really only set up hw state needed
to start the ring, all the software state setup and memory/buffer
allocations happen beforehand.

v2: We need to call intel_init_pipe_control after the ring init since
otherwise engine->dev is NULL and it falls over. Currently that's
now after the hw ring is enabled but a) we'll be fine as long as no
one submits a batch b) this will change soon.
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

99be1dfe

drm/i915: s/init()/init_hw()/ in intel_engine_cs · ecfe00d8

由 Daniel Vetter 提交于 11月 20, 2014

This is (mostly, some exceptions that need fixing) the hw setup
function which starts the ring. And not the function which allocates
all the resources.

Make this clear by giving it a better name.
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ecfe00d8

drm/i915: Consolidate ring freespace calculations · ebd0fd4b

由 Dave Gordon 提交于 11月 27, 2014

There are numerous places in the code where the driver's idea of
how much space is left in a ring is updated using the driver's
latest notions of the positions of 'head' and 'tail' for the ring.
Among them are some that update one or both of these values before
(re)doing the calculation. In particular, there are four different
places in the code where 'last_retired_head' is copied to 'head'
and then set to -1; and two of these do not have a guard to check
that it has actually been updated since last time it was consumed,
leaving the possibility that the dummy -1 can be transferred from
'last_retired_head' to 'head', causing the space calculation to
produce 'impossible' results (previously seen on Android/VLV).

This code therefore consolidates all the calculation and updating of
these values, such that there is only one place where the ring space
is updated, and it ALWAYS uses (and consumes) 'last_retired_head' if
(and ONLY if) it has been updated since the last call.
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ebd0fd4b

drm/i915: Connect requests to rings at creation not submission · ff79e857

由 John Harrison 提交于 11月 24, 2014

It makes a lot more sense (and makes future seqno -> request conversion patches
simpler) to fill in the 'ring' field of the request structure at the point of
creation rather than submission. Given that the request structure is assigned by
ring specific code and thus is locked to a ring from the start, there really is
no reason to defer this assignment.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ff79e857

drm/i915: Remove obsolete seqno parameter from 'i915_add_request' · 9400ae5c

由 John Harrison 提交于 11月 24, 2014

There is no longer any need to retrieve a seqno value from an i915_add_request()
call. The calling code already knows which request structure is being processed
(it can only be ring->OLR). And as the request itself is now used in preference
to the basic seqno value, the latter is now redundant in this situation.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9400ae5c

drm/i915: Convert i915_wait_seqno to i915_wait_request · a4b3a571

由 Daniel Vetter 提交于 11月 26, 2014

Updated i915_wait_seqno() to take a request structure instead of a seqno value
and renamed it accordingly. Internally, it just pulls the seqno out of the
request and calls on to __wait_seqno() as before. However, all the code further
up the stack is now simplified as it can just pass the request object straight
through without having to peek inside.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
[danvet: Squash in hunk from an earlier patch which was rebased
wrongly.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a4b3a571

drm/i915: Remove 'outstanding_lazy_seqno' · 6259cead

由 John Harrison 提交于 11月 24, 2014

The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6259cead

drm/i915: Add reference count to request structure · abfe262a

由 John Harrison 提交于 11月 24, 2014

The plan is to use request structures everywhere that seqno values were
previously used. This means saving pointers to structures in places that used to
be simple integers. In turn, that means that the target structure now needs much
more stringent lifetime tracking. That is, it must not be freed while some other
random object still holds a pointer to it.

To achieve this tracking, a reference count needs to be added. Whenever a
pointer to the structure is saved away, the count must be incremented and the
free must only occur when all references have been released.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

abfe262a

drm/i915: Ensure OLS & PLR are always in sync · 9eba5d4a

由 John Harrison 提交于 11月 24, 2014

The aim is to replace seqno values with request structures. A step along the way
is to switch to using the PLR in preference to the OLS. That requires the PLR to
only be valid when and only when the OLS is also valid. I.e., the two must be
kept in lock step. Then, code which was using the OLS can be safely switched
over to using the PLR instead.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9eba5d4a

drm/i915: Don't read 'HEAD' MMIO register in LRC mode · d65621c4

由 Dave Gordon 提交于 11月 18, 2014

The logical ring code was updating the software ring 'head' value
by reading the hardware 'HEAD' register. In LRC mode, this is not
valid as the hardware is not necessarily executing the same context
that is being processed by the software. Thus reading the h/w HEAD
could put an unrelated (undefined, effectively random) value into
the s/w 'head' -- A Bad Thing for the free space calculations.
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Reviewed-by: NDeepak S <deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d65621c4

drm/i915: Check for matching ringbuffer in logical_ring_wait_request() · 57e21513

由 Dave Gordon 提交于 11月 18, 2014

The request queue is per-engine, and may therefore contain requests
from several different contexts/ringbuffers. In determining which
request to wait for, this function should only consider requests
from the ringbuffer that it's checking for space, and ignore any
that it finds that belong to other contexts.
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Reviewed-by: NDeepak S <deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

57e21513

20 11月, 2014 3 次提交

drm/i915/bdw: Pin the ringbuffer backing object to GGTT on-demand · 7ba717cf

由 Thomas Daniel 提交于 11月 13, 2014

Same as with the context, pinning to GGTT regardless is harmful (it
badly fragments the GGTT and can even exhaust it).

Unfortunately, this case is also more complex than the previous one
because we need to map and access the ringbuffer in several places
along the execbuffer path (and we cannot make do by leaving the
default ringbuffer pinned, as before). Also, the context object
itself contains a pointer to the ringbuffer address that we have to
keep updated if we are going to allow the ringbuffer to move around.

v2: Same as with the context pinning, we cannot really do it during
an interrupt. Also, pin the default ringbuffers objects regardless
(makes error capture a lot easier).

v3: Rebased. Take a pin reference of the ringbuffer for each item
in the execlist request queue because the hardware may still be using
the ringbuffer after the MI_USER_INTERRUPT to notify the seqno update
is executed.  The ringbuffer must remain pinned until the context save
is complete.  No longer pin and unpin ringbuffer in
populate_lr_context() - this transient address is meaningless and the
pinning can cause a sleep while atomic.

v4: Moved ringbuffer pin and unpin into the lr_context_pin functions.
Downgraded pinning check BUG_ONs to WARN_ONs.

v5: Reinstated WARN_ONs for unexpected execlist states.  Removed unused
variable.

Issue: VIZ-4277
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Reviewed-by: NAkash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7ba717cf

drm/i915/bdw: Pin the context backing objects to GGTT on-demand · dcb4c12a

由 Oscar Mateo 提交于 11月 13, 2014

Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).

This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).

v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(

v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.

v4: Break out pin and unpin into functions.  Fix style problems reported
by checkpatch

v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked.  Add WARN_ONs to make sure this is the case in future.

Issue: VIZ-4277
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Reviewed-by: NAkash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

dcb4c12a

drm/i915/bdw: Clean up execlist queue items in retire_work · c86ee3a9

由 Thomas Daniel 提交于 11月 13, 2014

No longer create a work item to clean each execlist queue item.
Instead, move retired execlist requests to a queue and clean up the
items during retire_requests.

v2: Fix legacy ring path broken during overzealous cleanup

v3: Update idle detection to take execlists queue into account

v4: Grab execlist lock when checking queue state

v5: Fix leaking requests by freeing in execlists_retire_requests.

Issue: VIZ-4274
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Reviewed-by: NDeepak S <deepak.s@linux.intel.com>
Reviewed-by: NAkash Goel <akash.goels@gmail.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c86ee3a9

18 11月, 2014 1 次提交

drm/i915: Drop return value from lrc_setup_hardware_status_page · 70b0ea86

由 Daniel Vetter 提交于 11月 18, 2014

kmap never fails.
Spotted-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

70b0ea86

15 11月, 2014 1 次提交

drm/i915/skl: Don't allow disabling ppgtt and execlists on gen9+ · 70ee45e1

由 Damien Lespiau 提交于 11月 14, 2014

Running the driver without execlists and hence PPGTT (either aliasing or
full) isn't a supported configuration on gen9+.
Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

70ee45e1

14 11月, 2014 3 次提交

drm/i915/skl: Use correct use counters for force wakes · 6e7cc470

由 Tvrtko Ursulin 提交于 11月 13, 2014

Write and reads following the block changed use engine specific use counters
and unless that is matched here force wake use counting goes bad. Same
force wake is attempted to be taken twice which leads to at least time outs.

NOTE: Depending on feedback from hardware designers it may not be necessary
to grab force wakes on Gen9 here. But for Gen8 it is needed due to a race
between RC6 and ELSP writes.

v2: Added blitter force wake engine and made more future proof.
    Added commit note.
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6e7cc470

drm/i915/skl: Add Gen9 LRC size · 468c6816

由 Michael H. Nguyen 提交于 11月 13, 2014

The LRC increased in size on gen9. Make sure we return the right
size in get_lr_context_size()

v2. Corrected the size, should be 22 pages. I unintentionally mailed out
a test patch w/ size equaling 23 pages.
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NMichael H. Nguyen <michael.h.nguyen@intel.com>
Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

468c6816

drm/i915: Initialize workarounds in logical ring mode too · 771b9a53

由 Michel Thierry 提交于 11月 11, 2014

Following the legacy ring submission example, update the
ring->init_context() hook to support the execlist submission mode.

v2: update to use the new workaround macros and cleanup unused code.
This takes care of both bdw and chv workarounds.

v2.1: Add missing call to init_context() during deferred context creation.

v3: Split init_context (emit) in legacy/lrc modes. For lrc, get the ringbuf
from the context (Mika/Daniel).

v4: Merge init_context interfaces back, the legacy mode only needs the ring,
but the lrc mode needs the ring and context (Mika).

Issue: VIZ-4092
Issue: GMIN-3475
Change-Id: Ie3d093b2542ab0e2a44b90460533e2f979788d6c
Cc: Deepak S <deepak.s@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
[danvet: Align function paramater lists properly.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

771b9a53

08 11月, 2014 1 次提交

drm/i915/bdw: Setup global hardware status page in execlists mode · 1df06b75

由 Thomas Daniel 提交于 10月 29, 2014

Write HWS_PGA address even in execlists mode as the global hardware status
page is still required.  This address was previously uninitialized and
HWSP writes would clobber whatever buffer happened to reside at GGTT
address 0.

v2: Break out hardware status page setup into a separate function.

Issue: VIZ-2020
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1df06b75

05 11月, 2014 2 次提交

drm/i915: Remove redundant return value and WARN_ON · cd0707cb

由 Dave Gordon 提交于 10月 30, 2014

execlists_submit_context() always returns 0, which is redundant.
And its name is inaccurate, since it actually submits (up to)
TWO contextS. So we rename it, change it to "void", and remove
the WARN_ON() testing its return value.

Change-Id: Ie225b0eca7754c6093c8b8bd15550b251b6feb82
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cd0707cb

drm/i915: Fix null pointer dereference in ring cleanup code · 6402c330

由 John Harrison 提交于 10月 31, 2014

If a ring failed to initialise for any reason then the error path would try to
clean up all rings including those that had not yet been allocated. The ring
clean up code did a check that the ring was valid before starting its work.
Unfortunately, that was after it had already dereferenced the ring to obtain a
dev_private pointer.
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6402c330

21 10月, 2014 1 次提交

gpu:drm: Fix typo in Documentation/DocBook/drm.xml · 32197aab

由 Masanari Iida 提交于 10月 20, 2014

This patch fix spelling typos found in drm.xml.
It is because the file is generated from comments in
source codes, I have to fix the typos within source files.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

32197aab

19 9月, 2014 2 次提交

drm/i915: Fix irq checks in ring->irq_get/put functions · 7cd512f1

由 Daniel Vetter 提交于 9月 15, 2014

Yet another place that wasn't properly transformed when implementing
SOix. While at it convert the checks to WARN_ON on gen5+ (since we
don't have UMS potentially doing stupid things on those platforms).
And also add the corresponding checks to the put functions (again with
a WARN_ON) for gen5+.

v2: Drop the WARNINGS in the irq_put functions (including the existing
one for vebox), Chris convinced me that they're not that terribly
useful.

v3: Don't forget about execlist code.

Cc: Imre Deak <imre.deak@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: "Volkin, Bradley D" <bradley.d.volkin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

7cd512f1

drm/i915: add cherryview specfic forcewake in execlists_elsp_write · a01b0e94

由 Deepak S 提交于 9月 09, 2014

In chv, we have two power wells Render & Media. We need to use
corresponsing forcewake count. If we dont follow this we are getting
error "*ERROR*: Timed out waiting for forcewake old ack to clear" due to
multiple entry into __vlv_force_wake_get.
Signed-off-by: NDeepak S <deepak.s@linux.intel.com>
Requested-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a01b0e94

03 9月, 2014 2 次提交

drm/i915/bdw: Render state init for Execlists · 564ddb2f

由 Oscar Mateo 提交于 8月 21, 2014

The batchbuffer that sets the render context state is submitted
in a different way, and from different places.

We needed to make both the render state preparation and free functions
outside accesible, and namespace accordingly. This mess is so that all
LR, LRC and Execlists functionality can go together in intel_lrc.c: we
can fix all of this later on, once the interfaces are clear.

v2: Create a separate ctx->rcs_initialized for the Execlists case, as
suggested by Chris Wilson.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>

v3: Setup ring status page in lr_context_deferred_create when the
default context is being created. This means that the render state
init for the default context is no longer a special case.  Execute
deferred creation of the default context at the end of
logical_ring_init to allow the render state commands to be submitted.
Fix style errors reported by checkpatch. Rebased.
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

564ddb2f

drm/i915/bdw: Populate lrc with aliasing ppgtt if required · 2d965536

由 Thomas Daniel 提交于 8月 19, 2014

A previous commit broke aliasing PPGTT for lrc, resulting in a kernel oops
on boot. Add a check so that is full PPGTT is not in use the context is
populated with the aliasing PPGTT.

Issue: VIZ-4278
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2d965536

20 8月, 2014 2 次提交

drm/i915/bdw: Document Logical Rings, LR contexts and Execlists · 73e4d07f

由 Oscar Mateo 提交于 7月 24, 2014

Add theory of operation notes to intel_lrc.c and comments to externally
visible functions.

v2: Add notes on logical ring context creation.

v3: Use kerneldoc.

v4: Integrate it in the DocBook template.

Signed-off-by: Thomas Daniel <thomas.daniel@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v3)
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Drop hunk about render ring init function since that's not
yet merged.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

73e4d07f

drm/i915/bdw: Display execlists info in debugfs · 4ba70e44

由 Oscar Mateo 提交于 8月 07, 2014

v2: Warn and return if LRCs are not enabled.

v3: Grab the Execlists spinlock (noticed by Daniel Vetter).
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>

v4: Lock the struct mutex for atomic state capture
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4ba70e44

15 8月, 2014 8 次提交

drm/i915/bdw: Help out the ctx switch interrupt handler · f1ad5a1f

由 Oscar Mateo 提交于 7月 24, 2014

If we receive a storm of requests for the same context (see gem_storedw_loop_*)
we might end up iterating over too many elements in interrupt time, looking for
contexts to squash together. Instead, share the burden by giving more
intelligence to the queue function. At most, the interrupt will iterate over
three elements.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f1ad5a1f

drm/i915/bdw: Avoid non-lite-restore preemptions · e1fee72c

由 Oscar Mateo 提交于 7月 24, 2014

In the current Execlists feeding mechanism, full preemption is not
supported yet: only lite-restores are allowed (this is: the GPU
simply samples a new tail pointer for the context currently in
execution).

But we have identified an scenario in which a full preemption occurs:
1) We submit two contexts for execution (A & B).
2) The GPU finishes with the first one (A), switches to the second one
(B) and informs us.
3) We submit B again (hoping to cause a lite restore) together with C,
but in the time we spend writing to the ELSP, the GPU finishes B.
4) The GPU start executing B again (since we told it so).
5) We receive a B finished interrupt and, mistakenly, we submit C (again)
and D, causing a full preemption of B.

The race is avoided by keeping track of how many times a context has been
submitted to the hardware and by better discriminating the received context
switch interrupts: in the example, when we have submitted B twice, we won´t
submit C and D as soon as we receive the notification that B is completed
because we were expecting to get a LITE_RESTORE and we didn´t, so we know a
second completion will be received shortly.

Without this explicit checking, somehow, the batch buffer execution order
gets messed with. This can be verified with the IGT test I sent together with
the series. I don´t know the exact mechanism by which the pre-emption messes
with the execution order but, since other people is working on the Scheduler
+ Preemption on Execlists, I didn´t try to fix it. In these series, only Lite
Restores are supported (other kind of preemptions WARN).

v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several
rebase changes.

v3: Clarify how the race is avoided, as requested by Daniel.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Align function parameters ...]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e1fee72c

drm/i915/bdw: Handle context switch events · e981e7b1

由 Thomas Daniel 提交于 7月 24, 2014

Handle all context status events in the context status buffer on every
context switch interrupt. We only remove work from the execlist queue
after a context status buffer reports that it has completed and we only
attempt to schedule new contexts on interrupt when a previously submitted
context completes (unless no contexts are queued, which means the GPU is
free).

We canot call intel_runtime_pm_get() in an interrupt (or with a spinlock
grabbed, FWIW), because it might sleep, which is not a nice thing to do.
Instead, do the runtime_pm get/put together with the create/destroy request,
and handle the forcewake get/put directly.
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>

v2: Unreferencing the context when we are freeing the request might free
the backing bo, which requires the struct_mutex to be grabbed, so defer
unreferencing and freeing to a bottom half.

v3:
- Ack the interrupt inmediately, before trying to handle it (fix for
missing interrupts by Bob Beckett <robert.beckett@intel.com>).
- Update the Context Status Buffer Read Pointer, just in case (spotted
by Damien Lespiau).

v4: New namespace and multiple rebase changes.

v5: Squash with "drm/i915/bdw: Do not call intel_runtime_pm_get() in an
interrupt", as suggested by Daniel.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch ...]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e981e7b1

drm/i915/bdw: Two-stage execlist submit process · acdd884a

由 Michel Thierry 提交于 7月 24, 2014

Context switch (and execlist submission) should happen only when
other contexts are not active, otherwise pre-emption occurs.

To assure this, we place context switch requests in a queue and those
request are later consumed when the right context switch interrupt is
received (still TODO).

v2: Use a spinlock, do not remove the requests on unqueue (wait for
context switch completion).
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>

v3: Several rebases and code changes. Use unique ID.

v4:
- Move the queue/lock init to the late ring initialization.
- Damien's kmalloc review comments: check return, use sizeof(*req),
do not cast.

v5:
- Do not reuse drm_i915_gem_request. Instead, create our own.
- New namespace.

Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2-v5)
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[davnet: Checkpatch + wash-up s/BUG_ON/WARN_ON/.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

acdd884a

drm/i915/bdw: Write the tail pointer, LRC style · ae1250b9

由 Oscar Mateo 提交于 7月 24, 2014

Each logical ring context has the tail pointer in the context object,
so update it before submission.

v2: New namespace.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ae1250b9

drm/i915/bdw: Implement context switching (somewhat) · 84b790f8

由 Ben Widawsky 提交于 7月 24, 2014

A context switch occurs by submitting a context descriptor to the
ExecList Submission Port. Given that we can now initialize a context,
it's possible to begin implementing the context switch by creating the
descriptor and submitting it to ELSP (actually two, since the ELSP
has two ports).

The context object must be mapped in the GGTT, which means it must exist
in the 0-4GB graphics VA range.
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>

v2: This code has changed quite a lot in various rebases. Of particular
importance is that now we use the globally unique Submission ID to send
to the hardware. Also, context pages are now pinned unconditionally to
GGTT, so there is no need to bind them.

v3: Use LRCA[31:12] as hwCtxId[19:0]. This guarantees that the HW context
ID we submit to the ELSP is globally unique and != 0 (Bspec requirements
of the software use-only bits of the Context ID in the Context Descriptor
Format) without the hassle of the previous submission Id construction.
Also, re-add the ELSP porting read (it was dropped somewhere during the
rebases).

v4:
- Squash with "drm/i915/bdw: Add forcewake lock around ELSP writes" (BSPEC
  says: "SW must set Force Wakeup bit to prevent GT from entering C6 while
  ELSP writes are in progress") as noted by Thomas Daniel
  (thomas.daniel@intel.com).
- Rename functions and use an execlists/intel_execlists_ namespace.
- The BUG_ON only checked that the LRCA was <32 bits, but it didn't make
  sure that it was properly aligned. Spotted by Alistair Mcaulay
  <alistair.mcaulay@intel.com>.

v5:
- Improved source code comments as suggested by Chris Wilson.
- No need to abstract submit_ctx away, as pointed by Brad Volkin.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Checkpatch. Sigh.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

84b790f8

drm/i915/bdw: Emission of requests with logical rings · 48e29f55

由 Oscar Mateo 提交于 7月 24, 2014

On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:

"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."

The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:

- i915_add_request is used by intel_ring_idle, which in turn is
  used by i915_gpu_idle, which in turn is used in several places
  inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
  over i915_gem.c
- ...

If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.

To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
Thomas Daniel.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

48e29f55

drm/i915: Add temporary ring->ctx backpointer · 582d67f0

由 Oscar Mateo 提交于 7月 24, 2014

The execlist patches have a bit a convoluted and long history and due
to that have the actual submission still misplaced deeply burried in
the low-level ringbuffer handling code. This design goes back to the
legacy ringbuffer code with its tricky lazy request and simple work
submissiion using ring tail writes. For that reason they need a
ring->ctx backpointer.

The goal is to unburry that code and move it up into a level where the
full execlist context is available so that we can ditch this
backpointer. Until that's done make it really obvious that there's
work still to be done.

Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Acked-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

582d67f0