提交 · 2335986dd46f4fdfd11d613b68f4879c92726b47 · openeuler / Kernel

09 5月, 2016 1 次提交

drm/i915: Store a i915 backpointer from engine, and use it · c033666a

由 Chris Wilson 提交于 5月 06, 2016

   text	   data	    bss	    dec	    hex	filename
6309351	3578714	 696320	10584385	 a18141	vmlinux
6308391	3578714	 696320	10583425	 a17d81	vmlinux

Almost 1KiB of code reduction.

v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions

   text	   data	    bss	    dec	    hex	filename
6304579	3578778	 696320	10579677	 a16edd	vmlinux
6303427	3578778	 696320	10578525	 a16a5d	vmlinux

Now over 1KiB!
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk

c033666a

28 4月, 2016 4 次提交

drm/i915: Stop tracking execlists retired requests · e39d42fa

由 Tvrtko Ursulin 提交于 4月 28, 2016

With the previous patch having extended the pinned lifetime of
contexts by referencing the previous context from the current
request until the latter is retired (completed by the GPU),
we can now remove usage of execlist retired queue entirely.

This is because the above now guarantees that all execlist
object access requirements are satisfied by this new tracking,
and we can stop taking additional references and stop keeping
request on the execlists retired queue.

The latter was a source of significant scalability issues in
the driver causing performance hits on some tests. Most
dramatical of which was igt/gem_close_race which had run time
in tens of minutes which is now reduced to tens of seconds.
Signed-off-by: NTvrtko Ursulin <tvrtko@ursulin.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-24-git-send-email-chris@chris-wilson.co.uk

e39d42fa

drm/i915: Move the magical deferred context allocation into the request · 978f1e09

由 Chris Wilson 提交于 4月 28, 2016

We can hide more details of execlists from higher level code by removing
the explicit call to create an execlist context from execbuffer and
into its first use by execlists.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-20-git-send-email-chris@chris-wilson.co.uk

978f1e09

drm/i915: Replace the pinned context address with its unique ID · 7069b144

由 Chris Wilson 提交于 4月 28, 2016

Rather than reuse the current location of the context in the global GTT
for its hardware identifier, use the context's unique ID assigned to it
for its whole lifetime.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-18-git-send-email-chris@chris-wilson.co.uk

7069b144

drm/i915: Unify intel_ring_begin() · 987046ad

由 Chris Wilson 提交于 4月 28, 2016

Combine the near identical implementations of intel_logical_ring_begin()
and intel_ring_begin() - the only difference is that the logical wait
has to check for a matching ring (which is assumed by legacy).

In the process some debug messages are culled as there were following a
WARN if we hit an actual error.

v2: Updated commentary
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1461833819-3991-12-git-send-email-chris@chris-wilson.co.uk

987046ad

14 4月, 2016 1 次提交

drm/i915: Disentangle i915_drv.h includes · e73bdd20

由 Chris Wilson 提交于 4月 13, 2016

Separate out the layers of includes (linux, drm, intel, i915) so that it
is a little easier to order our definitions between our multiple
reentrant headers. A couple of headers needed fixes to make them more
standalone (forgotten includes, forward declarations etc).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-2-git-send-email-chris@chris-wilson.co.uk

e73bdd20

13 4月, 2016 1 次提交

drm/i915: Use new i915_gem_object_pin_map for LRC · 7d774cac

由 Tvrtko Ursulin 提交于 4月 12, 2016

We can use the new pin/lazy unpin API for simplicity
and more performance in the execlist submission paths.

v2:
  * Fix error handling and convert more users.
  * Compact some names for readability.

v3:
  * intel_lr_context_free was not unpinning.
  * Special case for GPU reset which otherwise unbalances
    the HWS object pages pin count by running the engine
    initialization only (not destructors).

v4:
  * Rebased on top of hws setup/init split.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460472042-1998-1-git-send-email-tvrtko.ursulin@linux.intel.com
[tursulin: renames: s/hwd/hws/, s/obj_addr/vaddr/]
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>

7d774cac

12 4月, 2016 1 次提交

drm/i915: Only grab correct forcewake for the engine with execlists · 3756685a

由 Tvrtko Ursulin 提交于 4月 12, 2016

Rather than blindly waking up all forcewake domains on command
submission, we can teach each engine what is (or are) the correct
one to take.

On platforms with multiple forcewake domains like VLV, CHV, SKL
and BXT, this has the potential of lowering the GPU and CPU
power use and submission latency.

To implement it we add a function named
intel_uncore_forcewake_for_reg whose purpose is to query which
forcewake domains need to be taken to read or write a specific
register with raw mmio accessors.

These enables the execlists engine setup  to query which
forcewake domains are relevant per engine on the currently
running platform.

v2:
  * Kerneldoc.
  * Split from intel_uncore.c macro extraction, WARN_ON,
    no warns on old platforms. (Chris Wilson)

v3:
  * Single domain per engine, mention all registers,
    bi-directional function and a new name, fix handling
    of gen6 and gen7 writes. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460468251-14069-1-git-send-email-tvrtko.ursulin@linux.intel.com

3756685a

04 4月, 2016 1 次提交

drm/i915: Move execlists irq handler to a bottom half · 27af5eea

由 Tvrtko Ursulin 提交于 4月 04, 2016

Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.

Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
 Modules linked in: [redacted for brevity]
 CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
 Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
 Workqueue: i915 gen6_pm_rps_work [i915]
 task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
 RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
 RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
 RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
 RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
 RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
 R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
 R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
 FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Stack:
  042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
  0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
  0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
 Call Trace:
  <IRQ>
  [<ffffffff8104a716>] irq_exit+0x86/0x90
  [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
  [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
  <EOI>
  [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
  [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
  [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
  [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
  [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
  [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
  [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
  [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
  [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
  [<ffffffff8105ab29>] process_one_work+0x139/0x350
  [<ffffffff8105b186>] worker_thread+0x126/0x490
  [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
  [<ffffffff8105fa64>] kthread+0xc4/0xe0
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
  [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170

I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.

When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.

By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.

Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:

gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us

This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:

gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us

Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.

Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)

More interesting scenario with regards to throughput is
"gem_latency -n 100" which  shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.

I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.

v2:
   * execlists_lock should be taken as spin_lock_bh when
     queuing work from userspace now. (Chris Wilson)
   * uncore.lock must be taken with spin_lock_irq when
     submitting requests since that now runs from either
     softirq or process context.

v3:
   * Expanded commit message with more testing data;
   * converted missed locking sites to _bh;
   * added execlist_lock comment. (Chris Wilson)

v4:
   * Mention dispatch parallelism in commit. (Chris Wilson)
   * Do not hold uncore.lock over MMIO reads since the block
     is already serialised per-engine via the tasklet itself.
     (Chris Wilson)
   * intel_lrc_irq_handler should be static. (Chris Wilson)
   * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
   * Document and WARN that tasklet cannot be active/pending
     on engine cleanup. (Chris Wilson/Imre Deak)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com

27af5eea

16 3月, 2016 1 次提交

drm/i915: Rename intel_engine_cs function parameters · 0bc40be8

由 Tvrtko Ursulin 提交于 3月 16, 2016

@@
identifier func;
@@
func(..., struct intel_engine_cs *
- ring
+ engine
, ...)
{
<...
- ring
+ engine
...>
}
@@
identifier func;
type T;
@@
T func(..., struct intel_engine_cs *
- ring
+ engine
, ...);
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>

0bc40be8

29 1月, 2016 1 次提交

drm/i915: Make LRC (un)pinning work on context and engine · e5292823

由 Tvrtko Ursulin 提交于 1月 28, 2016

Previously intel_lr_context_(un)pin were operating on requests
which is in conflict with their names.

If we make them take a context and an engine, it makes the names
make more sense and it also makes future fixes possible.

v2: Rebase for default_context/kernel_context change.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>

e5292823

18 1月, 2016 1 次提交

drm/i915: Do not call API requiring struct_mutex where it is not available · ca82580c

由 Tvrtko Ursulin 提交于 1月 15, 2016

LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).

To avoid that this patch caches some interesting bits and values
in the engine and context structures.

Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.

Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.

This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.

v2:
 * Cache the VMA instead of the address. (Chris Wilson)
 * Incorporate Dave Gordon's good comments and function name.

v3:
 * Extract ctx descriptor template to a function and group
   functions dealing with ctx descriptor & co together near
   top of the file. (Dave Gordon)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452870629-13830-1-git-send-email-tvrtko.ursulin@linux.intel.com

ca82580c

07 1月, 2016 1 次提交

drm/i915: Cleanup some of the CSB handling · 5590a5f0

由 Ben Widawsky 提交于 1月 05, 2016

I think this patch is a worthwhile cleanup even if it might look only marginally
useful. It gets more useful in upcoming patches and for handling of future GEN
platforms.

The only non-mechanical part of this is the removal of the extra & operation on
the ring->next_context_status_buffer. This is safe because right above this, we
already did a modulus operation.
Signed-off-by: NBen Widawsky <benjamin.widawsky@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1452018609-10142-2-git-send-email-benjamin.widawsky@intel.comReviewed-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5590a5f0

05 1月, 2016 1 次提交

drm/i915/guc: Expose (intel)_lr_context_size() · 95a66f7e

由 Dave Gordon 提交于 12月 18, 2015

The GuC code needs to know the size of a logical context, so we
expose get_lr_context_size(), renaming it intel_lr_context__size()
to fit the naming conventions for nonstatic functions.

For: VIZ-2021
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Signed-off-by: NAlex Dai <yu.dai@intel.com>
Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1450468812-4882-2-git-send-email-yu.dai@intel.comSigned-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

95a66f7e

05 12月, 2015 1 次提交

Revert "drm/i915: Extend LRC pinning to cover GPU context writeback" · af3302b9

由 Daniel Vetter 提交于 12月 04, 2015

This reverts commit 6d65ba94.

Mika Kuoppala traced down a use-after-free crash in module unload to
this commit, because ring->last_context is leaked beyond when the
context gets destroyed. Mika submitted a quick fix to patch that up in
the context destruction code, but that's too much of a hack.

The right fix is instead for the ring to hold a full reference onto
it's last context, like we do for legacy contexts.

Since this is causing a regression in BAT it gets reverted before we
can close this.

Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93248Acked-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

af3302b9

03 12月, 2015 1 次提交

drm/i915: Extend LRC pinning to cover GPU context writeback · 6d65ba94

由 Nick Hoath 提交于 12月 01, 2015

Use the first retired request on a new context to unpin
the old context. This ensures that the hw context remains
bound until it has been written back to by the GPU.
Now that the context is pinned until later in the request/context
lifecycle, it no longer needs to be pinned from context_queue to
retire_requests.
This fixes an issue with GuC submission where the GPU might not
have finished writing back the context before it is unpinned. This
results in a GPU hang.

v2: Moved the new pin to cover GuC submission (Alex Dai)
    Moved the new unpin to request_retire to fix coverage leak
v3: Added switch to default context if freeing a still pinned
    context just in case the hw was actually still using it
v4: Unwrapped context unpin to allow calling without a request
v5: Only create a switch to idle context if the ring doesn't
    already have a request pending on it (Alex Dai)
    Rename unsaved to dirty to avoid double negatives (Dave Gordon)
    Changed _no_req postfix to __ prefix for consistency (Dave Gordon)
    Split out per engine cleanup from context_free as it
    was getting unwieldy
    Corrected locking (Dave Gordon)
v6: Removed some bikeshedding (Mika Kuoppala)
    Added explanation of the GuC hang that this fixes (Daniel Vetter)
v7: Removed extra per request pinning from ring reset code (Alex Dai)
    Added forced ring unpin/clean in error case in context free (Alex Dai)
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
Issue: VIZ-4277
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: NAlex Dai <yu.dai@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6d65ba94

18 11月, 2015 2 次提交

drm/i915: Type safe register read/write · f0f59a00

由 Ville Syrjälä 提交于 11月 18, 2015

Make I915_READ and I915_WRITE more type safe by wrapping the register
offset in a struct. This should eliminate most of the fumbles we've had
with misplaced parens.

This only takes care of normal mmio registers. We could extend the idea
to other register types and define each with its own struct. That way
you wouldn't be able to accidentally pass the wrong thing to a specific
register access function.

The gpio_reg setup is probably the ugliest thing left. But I figure I'd
just leave it for now, and wait for some divine inspiration to strike
before making it nice.

As for the generated code, it's actually a bit better sometimes. Eg.
looking at i915_irq_handler(), we can see the following change:
lea 0x70024(%rdx,%rax,1),%r9d
mov $0x1,%edx
- movslq %r9d,%r9
- mov %r9,%rsi
- mov %r9,-0x58(%rbp)
- callq *0xd8(%rbx)
+ mov %r9d,%esi
+ mov %r9d,-0x48(%rbp)
callq *0xd8(%rbx)

So previously gcc thought the register offset might be signed and
decided to sign extend it, just in case. The rest appears to be
mostly just minor shuffling of instructions.

v2: i915_mmio_reg_{offset,equal,valid}() helpers added
s/_REG/_MMIO/ in the register defines
mo more switch statements left to worry about
ring_emit stuff got sorted in a prep patch
cmd parser, lrc context and w/a batch buildup also in prep patch
vgpu stuff cleaned up and moved to a prep patch
all other unrelated changes split out
v3: Rebased due to BXT DSI/BLC, MOCS, etc.
v4: Rebased due to churn, s/i915_mmio_reg_t/i915_reg_t/
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1447853606-2751-1-git-send-email-ville.syrjala@linux.intel.com

f0f59a00

drm/i915: Add functions to emit register offsets to the ring · f92a9162

由 Ville Syrjälä 提交于 11月 04, 2015

When register type safety happens, we can't just try to emit the
register itself to the ring. Instead we'll need to extract the
offset from it first. Add some convenience functions that will do
that.

v2: Convert MOCS setup too
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1446672017-24497-20-git-send-email-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>

f92a9162

28 9月, 2015 1 次提交

drm/i915: Consider HW CSB write pointer before resetting the sw read pointer · dfc53c5e

由 Michel Thierry 提交于 9月 28, 2015

A previous commit resets the Context Status Buffer (CSB) read pointer in
ring init
    commit c0a03a2e ("drm/i915: Reset CSB read pointer in ring init")

This is generally correct, but this pointer is not reset after
suspend/resume in some platforms (cht). In this case, the driver should
read the register value instead of resetting the sw read counter to 0.
Otherwise we process old events, leading to unwanted pre-emptions or
something worse.

But in other platforms (bdw) and also during GPU reset or power up, the
CSBWP is reset to 0x7 (an invalid number), and in this case the read
pointer should be set to 5 (the interrupt code will increment this
counter one more time, and will start reading from CSB[0]).

v2: When the CSB registers are reset, the read pointer needs to be set
to 5, otherwise the first write (CSB[0]) won't be read (Mika).
Replace magic numbers with GEN8_CSB_ENTRIES (6) and GEN8_CSB_PTR_MASK
(0x07).

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: NLei Shen <lei.shen@intel.com>
Signed-off-by: NDeepak S <deepak.s@intel.com>
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

dfc53c5e

23 9月, 2015 1 次提交

drm/i915: Parametrize LRC registers · 83843d84

由 Ville Syrjälä 提交于 9月 18, 2015

Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NJani Nikula <jani.nikula@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

83843d84

14 9月, 2015 1 次提交

drm/i915: Split alloc from init for lrc · e84fe803

由 Nick Hoath 提交于 9月 11, 2015

Extend init/init_hw split to context init.
   - Move context initialisation in to i915_gem_init_hw
   - Move one off initialisation for render ring to
        i915_gem_validate_context
   - Move default context initialisation to logical_ring_init

Rename intel_lr_context_deferred_create to
intel_lr_context_deferred_alloc, to reflect reduced functionality &
alloc/init split.

This patch is intended to split out the allocation of resources &
initialisation to allow easier reuse of code for resume/gpu reset.

v2: Removed function ptr wrapping of do_switch_context (Daniel Vetter)
    Left ->init_context int intel_lr_context_deferred_alloc
    (Daniel Vetter)
    Remove unnecessary init flag & ring type test. (Daniel Vetter)
    Improve commit message (Daniel Vetter)
v3: On init/reinit, set the hw next sequence number to the sw next
    sequence number. This is set to 1 at driver load time. This prevents
    the seqno being reset on reinit (Chris Wilson)
v4: Set seqno back to ~0 - 0x1000 at start-of-day, and increment by 0x100
    on reset.
    This makes it obvious which bbs are which after a reset. (David Gordon
    & John Harrison)
    Rebase.
v5: Rebase. Fixed rebase breakage. Put context pinning in separate
    function. Removed code churn. (Thomas Daniel)
v6: Cleanup up issues introduced in v2 & v5 (Thomas Daniel)

Issue: VIZ-4798
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: David Gordon <david.s.gordon@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e84fe803

15 8月, 2015 2 次提交

drm/i915: Integrate GuC-based command submission · d1675198

由 Alex Dai 提交于 8月 12, 2015

GuC-based submission is mostly the same as execlist mode, up to
intel_logical_ring_advance_and_submit(), where the context being
dispatched would be added to the execlist queue; at this point
we submit the context to the GuC backend instead.

There are, however, a few other changes also required, notably:
1.  Contexts must be pinned at GGTT addresses accessible by the GuC
    i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
    PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.

2.  The GuC's TLB must be invalidated after a context is pinned at
    a new GGTT address.

3.  GuC firmware uses the one page before Ring Context as shared data.
    Therefore, whenever driver wants to get base address of LRC, we
    will offset one page for it. LRC_PPHWSP_PN is defined as the page
    number of LRCA.

4.  In the work queue used to pass requests to the GuC, the GuC
    firmware requires the ring-tail-offset to be represented as an
    11-bit value, expressed in QWords. Therefore, the ringbuffer
    size must be reduced to the representable range (4 pages).

v2:
    Defer adding #defines until needed [Chris Wilson]
    Rationalise type declarations [Chris Wilson]

v4:
    Squashed kerneldoc patch into here [Daniel Vetter]

v5:
    Update request->tail in code common to both GuC and execlist modes.
    Add a private version of lr_context_update(), as sharing the
        execlist version leads to race conditions when the CPU and
        the GuC both update TAIL in the context image.
    Conversion of error-captured HWS page to string must account
        for offset from start of object to actual HWS (LRC_PPHWSP_PN).

Issue: VIZ-4884
Signed-off-by: NAlex Dai <yu.dai@intel.com>
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Reviewed-by: NTom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d1675198

drm/i915: Expose one LRC function for GuC submission mode · 919f1f55

由 Dave Gordon 提交于 8月 12, 2015

GuC submission is basically execlist submission, but with the GuC
handling the actual writes to the ELSP and the resulting context
switch interrupts.  So to describe a context for submission via
the GuC, we need one of the same functions used in execlist mode.
This commit exposes one such function, changing its name to better
describe what it does (it's related to logical ring contexts rather
than to execlists per se).

v2:
    Replaces previous "drm/i915: Move execlists defines from .c to .h"

v3:
    Incorporates a change to one of the functions exposed here that was
        previously part of an internal patch, but which was omitted from
        the version recently committed to drm-intel-nightly:
	    7a01a0a2 drm/i915/lrc: Update PDPx registers with lri commands
        So we reinstate this change here.

v4:
    Drop v3 change, update function parameters due to collision with
        8ee36152 drm/i915: Convert execlists_ctx_descriptor() for requests

v5:
    Don't expose execlists_update_context() after all. The current
        version is no longer compatible with GuC submission; trying to
        share the execlist version of this function results in both GuC
        and CPU updating TAIL in the context image, with bad results when
        they get out of step. The GuC submission path now has its own
        private version that just updates the ringbuffer start address,
        and not TAIL or PDPx.

v6:
    Rebased

Issue: VIZ-4884
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Reviewed-by: NTom O'Rourke <Tom.O'Rourke@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

919f1f55

14 7月, 2015 1 次提交

drm/i915: Added Programming of the MOCS · 3bbaba0c

由 Peter Antoine 提交于 7月 10, 2015

This change adds the programming of the MOCS registers to the gen 9+
platforms. The set of MOCS configuration entries introduced by this
patch is intended to be minimal but sufficient to cover the needs of
current userspace - i.e. a good set of defaults. It is expected to be
extended in the future to provide further default values or to allow
userspace to redefine its private MOCS tables based on its demand for
additional caching configurations. In this setup, userspace should
only utilize the first N entries, higher entries are reserved for
future use.

It creates a fixed register set that is programmed across the different
engines so that all engines have the same table. This is done as the
main RCS context only holds the registers for itself and the shared
L3 values. By trying to keep the registers consistent across the
different engines it should make the programming for the registers
consistent.

v2:
-'static const' for private data structures and style changes.(Matt Turner)
v3:
- Make the tables "slightly" more readable. (Damien Lespiau)
- Updated tables fix performance regression.
v4:
- Code formatting. (Chris Wilson)
- re-privatised mocs code. (Daniel Vetter)
v5:
- Changed the name of a function. (Chris Wilson)
v6:
- re-based
- Added Mesa table entry (skylake & broxton) (Francisco Jerez)
- Tidied up the readability defines (Francisco Jerez)
- NUMBER of entries defines wrong. (Jim Bish)
- Added comments to clear up the meaning of the tables (Jim Bish)
Signed-off-by: NPeter Antoine <peter.antoine@intel.com>

v7 (Francisco Jerez):
- Don't write L3-specific MOCS_ESC/SCC values into the e/LLC control
  tables.  Prefix L3-specific defines consistently with L3_ and
  e/LLC-specific defines with LE_ to avoid this kind of confusion in
  the future.
- Change L3CC WT define back to RESERVED (matches my hardware
  documentation and the original patch, probably a misunderstanding
  of my own previous comment).
- Drop Android tables, define new minimal tables more suitable for the
  open source stack.
- Add comment that the MOCS tables are part of the kernel ABI.
- Move intel_logical_ring_begin() and _advance() calls one level down
  (Chris Wilson).
- Minor formatting and style fixes.
v8 (Francisco Jerez):
- Add table size sanity check to emit_mocs_control/l3cc_table() (Chris
  Wilson).
- Add comment about undefined entries being implicitly set to uncached
  for forwards compatibility.
v9 (Francisco Jerez):
- Minor style fixes.
Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
Acked-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3bbaba0c

06 7月, 2015 2 次提交

drm/i915: Convert intel_lr_context_pin() for requests · 8ba319da

由 Mika Kuoppala 提交于 7月 03, 2015

Pass around requests to carry context deeper in callchain.
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8ba319da

drm/i915: Enable resource streamer on Execlists · 6922528a

由 Abdiel Janulgue 提交于 6月 16, 2015

GEN8 and above uses Execlists by default instead of the legacy
ringbuffer for batch execution. This patch enables the resource
streamer bits when required.

Patch is based on the initial work by Minu Mathai <minu.mathai@intel.com>
This version also adds the required bits to enable GEN8 Resource
Streamer context save and restore for Execlists.

Cc: ville.syrjala@linux.intel.com
Signed-off-by: NAbdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: NArun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6922528a

23 6月, 2015 4 次提交

drm/i915: Add *_ring_begin() to request allocation · ccd98fe4

由 John Harrison 提交于 5月 29, 2015

Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ccd98fe4

drm/i915: Update flush_all_caches() to take request structures · 4866d729

由 John Harrison 提交于 5月 29, 2015

Updated the *_ring_flush_all_caches() functions to take requests instead of
rings or ringbuf/context pairs.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4866d729

drm/i915: Merged the many do_execbuf() parameters into a structure · 5f19e2bf

由 John Harrison 提交于 5月 29, 2015

The do_execbuf() function takes quite a few parameters. The actual set of
parameters is going to change with the conversion to passing requests around.
Further, it is due to grow massively with the arrival of the GPU scheduler.

This patch simplifies the prototype by passing a parameter structure instead.
Changing the parameter set in the future is then simply a matter of
adding/removing items to the structure.

Note that the structure does not contain absolutely everything that is passed
in. This is because the intention is to use this structure more extensively
later in this patch series and more especially in the GPU scheduler that is
coming soon. The latter requires hanging on to the structure as the final
hardware submission can be delayed until long after the execbuf IOCTL has
returned to user land. Thus it is unsafe to put anything in the structure that
is local to the IOCTL call itself - such as the 'args' parameter. All entries
must be copies of data or pointers to structures that are reference counted in
some way and guaranteed to exist for the duration of the batch buffer's life.

v2: Rebased to newer tree and updated for changes to the command parser.
Specifically, a code shuffle has required saving the batch start address in the
params structure.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5f19e2bf

drm/i915: Set context in request from creation even in legacy mode · 40e895ce

由 John Harrison 提交于 5月 29, 2015

In execlist mode, the context object pointer is written in to the request
structure (and reference counted) at the point of request creation. In legacy
mode, this only happens inside i915_add_request().

This patch updates the legacy code path to match the execlist version. This
allows all the intermediate code between request creation and request submission
to get at the context object given only a request structure. Thus negating the
need to pass context pointers here, there and everywhere.

v2: Moved the context reference so it does not need to be undone if the
get_seqno() fails.

v3: Fixed execlist mode always hitting a warning about invalid last_contexts
(which don't exist in execlist mode).

v4: Updated for new i915_gem_request_alloc() scheme.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

40e895ce

01 4月, 2015 2 次提交

drm/i915: Move common request allocation code into a common function · 6689cb2b

由 John Harrison 提交于 3月 19, 2015

The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6689cb2b

drm/i915: Make intel_logical_ring_begin() static · bc0dce3f

由 John Harrison 提交于 3月 19, 2015

The only usage of intel_logical_ring_begin() is within intel_lrc.c so it can be
made static. To avoid a forward declaration at the top of the file, it and bunch
of other functions have been shuffled upwards.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

bc0dce3f

26 2月, 2015 1 次提交

drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading · 8e004efc

由 John Harrison 提交于 2月 13, 2015

There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.

This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.

OTC-Jira: VIZ-1587
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8e004efc

24 2月, 2015 1 次提交

drm/i915: Reset logical ring contexts' head and tail during GPU reset · 3e5b6f05

由 Thomas Daniel 提交于 2月 16, 2015

Work was getting left behind in LRC contexts during reset.  This causes a hang
if the GPU is reset when HEAD==TAIL because the context's ringbuffer head and
tail don't get reset and retiring a request doesn't alter them, so the ring
still appears full.

Added a function intel_lr_context_reset() to reset head and tail on a LRC and
its ringbuffer.

Call intel_lr_context_reset() for each context in i915_gem_context_reset() when
in execlists mode.

Testcase: igt/pm_rps --run-subtest reset #bdw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
[danvet: Flatten control flow in the lrc reset code a notch.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3e5b6f05

14 2月, 2015 3 次提交

drm/i915: Make intel_logical_ring_advance_and_submit() static · 183c9906

由 Damien Lespiau 提交于 2月 10, 2015

This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.
Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

183c9906

drm/i915: Make intel_lr_context_render_state_init() static · cef437ad

由 Damien Lespiau 提交于 2月 10, 2015

This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.
Signed-off-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cef437ad

drm/i915: Introduce bit definitions of CTXT_SR_CTRL register. · 5baa22c5

由 Zhi Wang 提交于 2月 10, 2015

This patch introduces 2 bit definitions of context save/restore
control register.
Signed-off-by: NZhi Wang <zhi.a.wang@intel.com>
Suggested-by: NDave Gordon <david.s.gordon@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5baa22c5

27 1月, 2015 3 次提交

drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request · 6d3d8274

由 Nick Hoath 提交于 1月 15, 2015

Move all remaining elements that were unique to execlists queue items
in to the associated request.

Issue: VIZ-4274

v2: Rebase. Fixed issue of overzealous freeing of request.
v3: Removed re-addition of cleanup work queue (found by Daniel Vetter)
v4: Rebase.
v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix
pointer in __i915_add_request (found by Thomas Daniel)
v6: Removed unrelated changes
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
Reviewed-by: NThomas Daniel <thomas.daniel@intel.com>
[danvet: Reformat comment with strange linebreaks.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6d3d8274

drm/i915: Remove FIXME_lrc_ctx backpointer · 21076372

由 Nick Hoath 提交于 1月 15, 2015

The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.

v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
Reviewed-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

21076372

drm/i915: Removed duplicate members from submit_request · 72f95afa

由 Nick Hoath 提交于 1月 15, 2015

Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.

Issue: VIZ-4274

v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.
Signed-off-by: NNick Hoath <nicholas.hoath@intel.com>
Reviewed-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

72f95afa

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功