提交 · c6a2ac712d7dee13c13e44c4c4184478853dcb37 · openanolis / cloud-kernel

01 3月, 2016 1 次提交

drm/i915: Execlists small cleanups and micro-optimisations · c6a2ac71

由 Tvrtko Ursulin 提交于 2月 26, 2016

Assorted changes in the areas of code cleanup, reduction of
invariant conditional in the interrupt handler and lock
contention and MMIO access optimisation.

 * Remove needless initialization.
 * Improve cache locality by reorganizing code and/or using
   branch hints to keep unexpected or error conditions out
   of line.
 * Favor busy submit path vs. empty queue.
 * Less branching in hot-paths.

v2:

 * Avoid mmio reads when possible. (Chris Wilson)
 * Use natural integer size for csb indices.
 * Remove useless return value from execlists_update_context.
 * Extract 32-bit ppgtt PDPs update so it is out of line and
   shared with two callers.
 * Grab forcewake across all mmio operations to ease the
   load on uncore lock and use chepear mmio ops.

v3:

 * Removed some more pointless u8 data types.
 * Removed unused return from execlists_context_queue.
 * Commit message updates.

v4:
 * Unclumsify the unqueue if statement. (Chris Wilson)
 * Hide forcewake from the queuing function. (Chris Wilson)

Version 3 now makes the irq handling code path ~20% smaller on
48-bit PPGTT hardware, and a little bit less elsewhere. Hot
paths are mostly in-line now and hammering on the uncore
spinlock is greatly reduced together with mmio traffic to an
extent.

Benchmarking with "gem_latency -n 100" (keep submitting
batches with 100 nop instruction) shows approximately 4% higher
throughput, 2% less CPU time and 22% smaller latencies. This was
on a big-core while small-cores could benefit even more.

Most likely reason for the improvements are the MMIO
optimization and uncore lock traffic reduction.

One odd result is with "gem_latency -n 0" (dispatching empty
batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which is yet unexplained.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com

c6a2ac71

25 1月, 2016 1 次提交

drm/i915/guc: Decouple GuC engine id from ring id · 397097b0

由 Alex Dai 提交于 1月 23, 2016

Previously GuC uses ring id as engine id because of same definition.
But this is not true since this commit:

commit de1add36
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Fri Jan 15 15:12:50 2016 +0000

    drm/i915: Decouple execbuf uAPI from internal implementation

Added GuC engine id into GuC interface to decouple it from ring id used
by driver.

v2: Keep ring name print out in debugfs; using for_each_ring() where
    possible to keep driver consistent. (Chris W.)
Signed-off-by: NAlex Dai <yu.dai@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453579094-29860-1-git-send-email-yu.dai@intel.com

397097b0

21 1月, 2016 4 次提交

drm/i915: Seal busy-ioctl uABI and prevent leaking of internal ids · 426960be

由 Chris Wilson 提交于 1月 15, 2016

Tvrtko was looking through the execbuffer-ioctl and noticed that the
uABI was tightly coupled to our internal engine identifiers. Close
inspection also revealed that we leak those internal engine identifiers
through the busy-ioctl, and those internal identifiers already do not
match the user identifiers. Fortuitiously, there is only one user of the
set of busy rings from the busy-ioctl, and they only wish to choose
between the RENDER and the BLT engines.

Let's fix the userspace ABI while we still can.

v2: Update the uAPI documentation to explain the identifiers.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_busy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452876706-21620-1-git-send-email-chris@chris-wilson.co.uk

426960be

drm/i915: Decouple execbuf uAPI from internal implementation · de1add36

由 Tvrtko Ursulin 提交于 1月 15, 2016

At the moment execbuf ring selection is fully coupled to
internal ring ids which is not a good thing on its own.

This dependency is also spread between two source files and
not spelled out at either side which makes it hidden and
fragile.

This patch decouples this dependency by introducing an explicit
translation table of execbuf uAPI to ring id close to the only
call site (i915_gem_do_execbuffer).

This way we are free to change driver internal implementation
details without breaking userspace. All state relating to the
uAPI is now contained in, or next to, i915_gem_do_execbuffer.

As a side benefit, this patch decreases the compiled size
of i915_gem_do_execbuffer.

v2: Extract ring selection into eb_select_ring. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870770-13981-1-git-send-email-tvrtko.ursulin@linux.intel.com

de1add36

drm/i915: Use ordered seqno write interrupt generation on gen8+ execlists · 7c17d377

由 Chris Wilson 提交于 1月 20, 2016

Broadwell and later currently use the same unordered command sequence to
update the seqno in the HWS status page and then assert the user
interrupt. We should apply the w/a from legacy (where we do an mmio
read to delay the seqno read after the interrupt), but this is not
enough to enforce coherent seqno visibilty on Skylake. Rather than
search for the proper post-interrupt seqno barrier, use a strongly
ordered command sequence to write the seqno, then assert the user
interrupt from the ring.

v2: Move around the wa tail dwords to avoid adding duplicate code.

v3: Add references, comments on workarounds and bit5 check.

References: https://bugs.freedesktop.org/show_bug.cgi?id=93693
Testcase: igt/gem_ring_sync_loop #skl
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453297415-17793-1-git-send-email-mika.kuoppala@intel.com

7c17d377

drm/i915: abolish separate per-ring default_context pointers · ed54c1a1

由 Dave Gordon 提交于 1月 19, 2016

Now that we've eliminated a lot of uses of ring->default_context,
we can eliminate the pointer itself.

All the engines share the same default intel_context, so we can just
keep a single reference to it in the dev_priv structure rather than one
in each of the engine[] elements. This make refcounting more sensible
too, as we now have a refcount of one for the one pointer, rather than
a refcount of one but multiple pointers.

From an idea by Chris Wilson.

v2: transform an extra instance of ring->default_context introduced by
42f1cae8 drm/i915: Restore inhibiting the load of the default context
That patch's commentary includes:
v2: Mark the global default context as uninitialized on GPU reset so
that the context-local workarounds are reloaded upon re-enabling
The code implementing that now also benefits from the replacement of
the multiple (per-ring) pointers to the default context with a single
pointer to the unique kernel context.

v4: Rebased, remove underused local (Nick Hoath)
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Reviewed-by: NNick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1453230175-19330-3-git-send-email-david.s.gordon@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ed54c1a1

20 1月, 2016 1 次提交

drm/i915: turn some bogus kernel-doc comments to normal comments · e2828914

由 Jani Nikula 提交于 1月 18, 2016

Apparently accidental or misplaced /** kernel-doc comments, confusing
the tool. Turn them to normal comments.
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1453101588-18008-2-git-send-email-jani.nikula@intel.com

e2828914

18 1月, 2016 2 次提交

drm/i915: Cache ringbuffer GTT VMA · 0eb973d3

由 Tvrtko Ursulin 提交于 1月 15, 2016

Purpose is to avoid calling i915_gem_obj_ggtt_offset from the
interrupt context without the big lock held.

v2: Renamed gtt_start to gtt_offset. (Daniel Vetter)
v3: Cache the VMA instead of address. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-2-git-send-email-tvrtko.ursulin@linux.intel.com

0eb973d3

drm/i915: Do not call API requiring struct_mutex where it is not available · ca82580c

由 Tvrtko Ursulin 提交于 1月 15, 2016

LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).

To avoid that this patch caches some interesting bits and values
in the engine and context structures.

Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.

Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.

This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.

v2:
 * Cache the VMA instead of the address. (Chris Wilson)
 * Incorporate Dave Gordon's good comments and function name.

v3:
 * Extract ctx descriptor template to a function and group
   functions dealing with ctx descriptor & co together near
   top of the file. (Dave Gordon)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com

ca82580c

08 1月, 2016 1 次提交

drm/i915: Inspect subunit states on hangcheck · 61642ff0

由 Mika Kuoppala 提交于 12月 01, 2015

If head seems stuck and engine in question is rcs,
inspect subunit state transitions from undone to done,
before deciding that this really is a hang instead of limited
progress. Only account the transitions of subunits from
undone to done once, to prevent unstable subunit states
to keep us falsely active.

As this adds one extra steps to hangcheck heuristics,
before hang is declared, it adds 1500ms to to detect hang
for render ring to a total of 7500ms. We could sample
the subunit states on first head stuck condition but
decide not to do so only in order to mimic old behaviour. This
way the check order of promotion from seqno > atchd > instdone
is consistently done.

v2: Deal with unstable done states (Arun)
    Clear instdone progress on head and seqno movement (Chris)
    Report raw and accumulated instdone's in in debugfs (Chris)
    Return HANGCHECK_ACTIVE on undone->done

References: https://bugs.freedesktop.org/show_bug.cgi?id=93029
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1448985372-19535-1-git-send-email-mika.kuoppala@intel.com

61642ff0

10 12月, 2015 1 次提交

drm/i915: intel_ring_initialized() must be simple and inline · b0366a54

由 Dave Gordon 提交于 12月 08, 2015

Based on Chris Wilson's patch from 6 months ago, rebased and adapted.

The current implementation of intel_ring_initialized() is too heavyweight;
it's a non-inlined function that chases several levels of pointers. This
wouldn't matter too much if it were rarely called, but it's used inside
the iterator test of for_each_ring() and is therefore called quite
frequently. So let's make it simple and inline ...

The idea here is to use ring->dev as an indicator showing which engines
have been initialised and are therefore to be included in iterations that
use for_each_ring(). This allows us to avoid multiple memory references
and a (non-inlined) function call on each iteration of each such loop.

	Fixes regression from
	commit 48d82387
	Author: Oscar Mateo <oscar.mateo@intel.com>
	Date:   Thu Jul 24 17:04:23 2014 +0100

	    drm/i915/bdw: Generic logical ring init and cleanup
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1449586956-32360-2-git-send-email-david.s.gordon@intel.com

b0366a54

18 11月, 2015 2 次提交

drm/i915: Type safe register read/write · f0f59a00

由 Ville Syrjälä 提交于 11月 18, 2015

Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.

This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.

The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.

As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)

So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.

v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com

f0f59a00

drm/i915: Add functions to emit register offsets to the ring · f92a9162

由 Ville Syrjälä 提交于 11月 04, 2015

When register type safety happens, we can't just try to emit the
register itself to the ring. Instead we'll need to extract the
offset from it first. Add some convenience functions that will do
that.

v2: Convert MOCS setup too
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>

f92a9162

29 10月, 2015 1 次提交

drm/i915: Recover all available ringbuffer space following reset · 608c1a52

由 Chris Wilson 提交于 9月 03, 2015

Having flushed all requests from all queues, we know that all
ringbuffers must now be empty. However, since we do not reclaim
all space when retiring the request (to prevent HEADs colliding
with rapid ringbuffer wraparound) the amount of available space
on each ringbuffer upon reset is less than when we start. Do one
more pass over all the ringbuffers to reset the available space
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>

608c1a52

04 9月, 2015 1 次提交

drm/i915: Refactor common ringbuffer allocation code · 01101fa7

由 Chris Wilson 提交于 9月 03, 2015

A small, very small, step to sharing the duplicate code between
execlists and legacy submission engines, starting with the ringbuffer
allocation code.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

01101fa7

26 8月, 2015 1 次提交

drm/i915/bxt: work around HW coherency issue when accessing GPU seqno · 319404df

由 Imre Deak 提交于 8月 14, 2015

By running igt/store_dword_loop_render on BXT we can hit a coherency
problem where the seqno written at GPU command completion time is not
seen by the CPU. This results in __i915_wait_request seeing the stale
seqno and not completing the request (not considering the lost
interrupt/GPU reset mechanism). I also verified that this isn't a case
of a lost interrupt, or that the command didn't complete somehow: when
the coherency issue occured I read the seqno via an uncached GTT mapping
too. While the cached version of the seqno still showed the stale value
the one read via the uncached mapping was the correct one.

Work around this issue by clflushing the corresponding CPU cacheline
following any store of the seqno and preceding any reading of it. When
reading it do this only when the caller expects a coherent view.

v2:
- fix using the proper logical && instead of a bitwise & (Jani, Mika)
- limit the workaround to A stepping, on later steppings this HW issue
  is fixed
v3:
- use a separate get_seqno/set_seqno vfunc (Chris)

Testcase: igt/store_dword_loop_render
Signed-off-by: NImre Deak <imre.deak@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

319404df

14 7月, 2015 1 次提交

drm/i915: Snapshot seqno of most recently submitted request. · 94f7bbe1

由 Tomas Elf 提交于 7月 09, 2015

The hang checker needs to inspect whether or not the ring request list is empty
as well as if the given engine has reached or passed the most recently
submitted request. The problem with this is that the hang checker cannot grab
the struct_mutex, which is required in order to safely inspect requests since
requests might be deallocated during inspection. In the past we've had kernel
panics due to this very unsynchronized access in the hang checker.

One solution to this problem is to not inspect the requests directly since
we're only interested in the seqno of the most recently submitted request - not
the request itself. Instead the seqno of the most recently submitted request is
stored separately, which the hang checker then inspects, circumventing the
issue of synchronization from the hang checker entirely.

This fixes a regression introduced in

commit 44cdd6d2
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Mon Nov 24 18:49:40 2014 +0000

    drm/i915: Convert 'ring_idle()' to use requests not seqnos

v2 (Chris Wilson):
- Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather
than compute it over again.
- Remove extra whitespace.

Issue: VIZ-5998
Signed-off-by: NTomas Elf <tomas.elf@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Add regressing commit citation provided by Chris.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

94f7bbe1

06 7月, 2015 1 次提交

drm/i915: Enable resource streamer bits on MI_BATCH_BUFFER_START · 919032ec

由 Abdiel Janulgue 提交于 6月 16, 2015

Adds support for enabling the resource streamer on the legacy
ringbuffer for HSW and GEN8.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

919032ec

03 7月, 2015 1 次提交

drm/i915: Reserve space improvements · 79bbcc29

由 John Harrison 提交于 6月 30, 2015

An earlier patch was added to reserve space in the ring buffer for the
commands issued during 'add_request()'. The initial version was
pessimistic in the way it handled buffer wrapping and would cause
premature wraps and thus waste ring space.

This patch updates the code to better handle the wrap case. It no
longer enforces that the space being asked for and the reserved space
are a single contiguous block. Instead, it allows the reserve to be on
the far end of a wrap operation. It still guarantees that the space is
available so when the wrap occurs, no wait will happen. Thus the wrap
cannot fail which is the whole point of the exercise.

Also fixed a merge failure with some comments from the original patch.

v2: Incorporated suggestion by David Gordon to move the wrap code
inside the prepare function and thus allow a single combined
wait_for_space() call rather than doing one before the wrap and
another after. This also makes the prepare code much simpler and
easier to follow.

v3: Fix for 'effective_size' vs 'size' during ring buffer remainder
calculations (spotted by Tomas Elf).

For: VIZ-5115
CC: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

79bbcc29

23 6月, 2015 19 次提交

drm/i915: Remove the now obsolete 'outstanding_lazy_request' · bccca494

由 John Harrison 提交于 5月 29, 2015

The outstanding_lazy_request is no longer used anywhere in the driver.
Everything that was looking at it now has a request explicitly passed in from on
high. Everything that was relying upon it behind the scenes is now explicitly
creating/passing/submitting its own private request. Thus the OLR can be
removed.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

bccca494

drm/i915: Remove the now obsolete intel_ring_get_request() · 59c35a4d

由 John Harrison 提交于 5月 29, 2015

Much of the driver has now been converted to passing requests around instead of
rings/ringbufs/contexts. Thus the function for retreiving the request from a
ring (i.e. the OLR) is no longer used and can be removed.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

59c35a4d

drm/i915: Add *_ring_begin() to request allocation · ccd98fe4

由 John Harrison 提交于 5月 29, 2015

Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ccd98fe4

drm/i915: Update intel_ring_begin() to take a request structure · 5fb9de1a

由 John Harrison 提交于 5月 29, 2015

Now that everything above has been converted to use requests, intel_ring_begin()
can be updated to take a request instead of a ring. This also means that it no
longer needs to lazily allocate a request if no-one happens to have done it
earlier.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5fb9de1a

drm/i915: Update cacheline_align() to take a request structure · bba09b12

由 John Harrison 提交于 5月 29, 2015

Updated intel_ring_cacheline_align() to take a request instead of a ring.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

bba09b12

drm/i915: Update ring->signal() to take a request structure · f7169687

由 John Harrison 提交于 5月 29, 2015

Updated the various ring->signal() implementations to take a request instead of
a ring. This removes their reliance on the OLR to obtain the seqno value that
should be used for the signal.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f7169687

drm/i915: Update ring->sync_to() to take a request structure · 599d924c

由 John Harrison 提交于 5月 29, 2015

Updated the ring->sync_to() implementations to take a request instead of a ring.
Also updated the tracer to include the request id.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
[danvet: Rebase since I didn't merge the patch which added ->uniq.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

599d924c

drm/i915: Update ring->emit_bb_start() to take a request structure · be795fc1

由 John Harrison 提交于 5月 29, 2015

Updated the ring->emit_bb_start() implementation to take a request instead of a
ringbuf/context pair.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

be795fc1

drm/i915: Update ring->dispatch_execbuffer() to take a request structure · 53fddaf7

由 John Harrison 提交于 5月 29, 2015

Updated the various ring->dispatch_execbuffer() implementations to take a
request instead of a ring.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

53fddaf7

drm/i915: Update ring->emit_request() to take a request structure · c4e76638

由 John Harrison 提交于 5月 29, 2015

Updated the ring->emit_request() implementation to take a request instead of a
ringbuf/request pair. Also removed its use of the OLR for obtaining the
request's seqno.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c4e76638

drm/i915: Update ring->add_request() to take a request structure · ee044a88

由 John Harrison 提交于 5月 29, 2015

Updated the various ring->add_request() implementations to take a request
instead of a ring. This removes their reliance on the OLR to obtain the seqno
value that the request should be tagged with.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ee044a88

drm/i915: Update ring->emit_flush() to take a request structure · 7deb4d39

由 John Harrison 提交于 5月 29, 2015

Updated the various ring->emit_flush() implementations to take a request instead
of a ringbuf/context pair.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7deb4d39

drm/i915: Update ring->flush() to take a requests structure · a84c3ae1

由 John Harrison 提交于 5月 29, 2015

Updated the various ring->flush() functions to take a request instead of a ring.
Also updated the tracer to include the request id.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
[danvet: Rebase since I didn't merge the addition of req->uniq.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a84c3ae1

drm/i915: Update flush_all_caches() to take request structures · 4866d729

由 John Harrison 提交于 5月 29, 2015

Updated the *_ring_flush_all_caches() functions to take requests instead of
rings or ringbuf/context pairs.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4866d729

drm/i915: Update a bunch of execbuffer helpers to take request structures · 2f20055d

由 John Harrison 提交于 5月 29, 2015

Updated *_ring_invalidate_all_caches(), i915_reset_gen7_sol_offsets() and
i915_emit_box() to take request structures instead of ring or ringbuf/context
pairs.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2f20055d

drm/i915: Update queue_flip() to take a request structure · 6258fbe2

由 John Harrison 提交于 5月 29, 2015

Updated the display page flip code to do explicit request creation and
submission rather than relying on the OLR and just hoping that the request
actually gets submitted at some random point.

The sequence is now to create a request, queue the work to the ring, assign the
known request to the flip queue work item then actually submit the work and post
the request.

Note that every single flip function used to finish with
'__intel_ring_advance(ring);'. However, immediately after they return there is
now an add request call which will do the advance anyway. Thus the many
duplicate advance calls have been removed.

v2: Updated commit message with comment about advance removal.

v3: The request can now be allocated by the _sync() code earlier on. Thus the
page flip path does not necessarily need to allocate a new request, it may be
able to re-use one.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6258fbe2

drm/i915: Update init_context() to take a request structure · 8753181e

由 John Harrison 提交于 5月 29, 2015

Now that everything above has been converted to use requests, it is possible to
update init_context() to take a request pointer instead of a ring/context pair.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8753181e

drm/i915: Reserve ring buffer space for i915_add_request() commands · 29b1b415

由 John Harrison 提交于 6月 18, 2015

It is a bad idea for i915_add_request() to fail. The work will already have been
send to the ring and will be processed, but there will not be any tracking or
management of that work.

The only way the add request call can fail is if it can't write its epilogue
commands to the ring (cache flushing, seqno updates, interrupt signalling). The
reasons for that are mostly down to running out of ring buffer space and the
problems associated with trying to get some more. This patch prevents that
situation from happening in the first place.

When a request is created, it marks sufficient space as reserved for the
epilogue commands. Thus guaranteeing that by the time the epilogue is written,
there will be plenty of space for it. Note that a ring_begin() call is required
to actually reserve the space (and do any potential waiting). However, that is
not currently done at request creation time. This is because the ring_begin()
code can allocate a request. Hence calling begin() from the request allocation
code would lead to infinite recursion! Later patches in this series remove the
need for begin() to do the allocate. At that point, it becomes safe for the
allocate to call begin() and really reserve the space.

Until then, there is a potential for insufficient space to be available at the
point of calling i915_add_request(). However, that would only be in the case
where the request was created and immediately submitted without ever calling
ring_begin() and adding any work to that request. Which should never happen. And
even if it does, and if that request happens to fall down the tiny window of
opportunity for failing due to being out of ring space then does it really
matter because the request wasn't doing anything in the first place?

v2: Updated the 'reserved space too small' warning to include the offending
sizes. Added a 'cancel' operation to clean up when a request is abandoned. Added
re-initialisation of tracking state after a buffer wrap to keep the sanity
checks accurate.

v3: Incremented the reserved size to accommodate Ironlake (after finally
managing to run on an ILK system). Also fixed missing wrap code in LRC mode.

v4: Added extra comment and removed duplicate WARN (feedback from Tomas).

For: VIZ-5115
CC: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

29b1b415

drm/i915/gen8: Add infrastructure to initialize WA batch buffers · 17ee950d

由 Arun Siluvery 提交于 6月 19, 2015

Some of the WA are to be applied during context save but before restore and
some at the end of context save/restore but before executing the instructions
in the ring, WA batch buffers are created for this purpose and these WA cannot
be applied using normal means. Each context has two registers to load the
offsets of these batch buffers. If they are non-zero, HW understands that it
need to execute these batches.

v1: In this version two separate ring_buffer objects were used to load WA
instructions for indirect and per context batch buffers and they were part
of every context.

v2: Chris suggested to include additional page in context and use it to load
these WA instead of creating separate objects. This will simplify lot of things
as we need not explicity pin/unpin them. Thomas Daniel further pointed that GuC
is planning to use a similar setup to share data between GuC and driver and
WA batch buffers can probably share that page. However after discussions with
Dave who is implementing GuC changes, he suggested to use an independent page
for the reasons - GuC area might grow and these WA are initialized only once and
are not changed afterwards so we can share them share across all contexts.

The page is updated with WA during render ring init. This has an advantage of
not adding more special cases to default_context.

We don't know upfront the number of WA we will applying using these batch buffers.
For this reason the size was fixed earlier but it is not a good idea. To fix this,
the functions that load instructions are modified to report the no of commands
inserted and the size is now calculated after the batch is updated. A macro is
introduced to add commands to these batch buffers which also checks for overflow
and returns error.
We have a full page dedicated for these WA so that should be sufficient for
good number of WA, anything more means we have major issues.
The list for Gen8 is small, same for Gen9 also, maybe few more gets added
going forward but not close to filling entire page. Chris suggested a two-pass
approach but we agreed to go with single page setup as it is a one-off routine
and simpler code wins.

One additional option is offset field which is helpful if we would like to
have multiple batches at different offsets within the page and select them
based on some criteria. This is not a requirement at this point but could
help in future (Dave).

Chris provided some helpful macros and suggestions which further simplified
the code, they will also help in reducing code duplication when WA for
other Gen are added. Add detailed comments explaining restrictions.
Use do {} while(0) for wa_ctx_emit() macro.

(Many thanks to Chris, Dave and Thomas for their reviews and inputs)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: NRafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

17ee950d

15 6月, 2015 2 次提交

drm/i915: Extend the parser to check register writes against a mask/value pair. · c1091b25

由 Francisco Jerez 提交于 5月 29, 2015

In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits.  This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset.  For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.

This will be used in my next patch to grant userspace partial write
access to some sensitive registers.
Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
Reviewed-by: NZhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

c1091b25

drm/i915: Extend the parser to check register writes against a mask/value pair. · 4e86f725

由 Francisco Jerez 提交于 5月 29, 2015

In some cases it might be unnecessary or dangerous to give userspace
the right to write arbitrary values to some register, even though it
might be desirable to give it control of some of its bits. This patch
extends the register whitelist entries to contain a mask/value pair in
addition to the register offset. For registers with non-zero mask,
any LRM writes and LRI writes where the bits of the immediate given by
the mask don't match the specified value will be rejected.

This will be used in my next patch to grant userspace partial write
access to some sensitive registers.
Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
Reviewed-by: NZhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4e86f725

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功