提交 · 0b076ecdf343b029c4c2c3a94ffd0199d97aa46c · openeuler / raspberrypi-kernel

22 6月, 2015 1 次提交

drm/i915: Remove unused ring argument from frontbuffer invalidate and busy functions. · 77a0d1ca

由 Rodrigo Vivi 提交于 6月 18, 2015

This patch doesn't have any functional change, but organize fruntbuffer
invalidate and busy by removing unecesarry signature argument for ring.

It was unsed on mark_fb_busy and only used on fb_obj_invalidate for the
same ORIGIN_CS usage. So let's clean it a bit

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

77a0d1ca

15 6月, 2015 1 次提交

drm/i915: Don't skip request retirement if the active list is empty · 11ee9615

由 Ville Syrjälä 提交于 5月 28, 2015

Apparently we can have requests even if though the active list is empty,
so do the request retirement regardless of whether there's anything
on the active list.

The way it happened here is that during suspend intel_ring_idle()
notices the olr hanging around and then proceeds to get rid of it by
adding a request. However since there was nothing on the active lists
i915_gem_retire_requests() didn't clean those up, and so the idle work
never runs, and we leave the GPU "busy" during suspend resulting in a
WARN later.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

11ee9615

27 5月, 2015 1 次提交

drm/i915: Use spinlocks for checking when to waitboost · 8d3afd7d

由 Chris Wilson 提交于 5月 21, 2015

In commit 1854d5ca
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:32 2015 +0100

    drm/i915: Deminish contribution of wait-boosting from clients

we removed an atomic timer based check for allowing waitboosting and
moved it below the mutex taken during RPS. However, that mutex can be
held for long periods of time on Vallyview/Cherryview as communication
with the PCU is slow. As clients may frequently wait for results (e.g.
such as tranform feedback) we introduced contention between the client
and the RPS worker. We can take advantage of the RPS worker, by
switching the wait boost decision to use spin locks and defer the
actual reclocking to the worker.

Fixes a regression of up to 45% on Baytrail and Baswell!

v2 (Daniel):
- Use max_freq_softlimit instead of the not-yet-merged boost
  frequency.
- Don't inject a fake irq into the boost work, instead treat
  client_boost as just another legit waker.

v3: Drop the now unused mask (Chris).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8d3afd7d

22 5月, 2015 3 次提交

drm/i915: Introduce DRM_I915_THROTTLE_JIFFIES · d0bc54f2

由 Chris Wilson 提交于 5月 21, 2015

As Daniel commented on

commit b7ffe1362c5f468b853223acc9268804aa92afc8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Apr 27 13:41:24 2015 +0100

    drm/i915: Free RPS boosts for all laggards

it is better to be explicit when sharing hardcoded values such as
throttle/boost timeouts. Make it so!
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d0bc54f2

drm/i915: Use the correct destructor for freeing requests on error · 9a0c1e27

由 Chris Wilson 提交于 5月 21, 2015

After allocating from the slab cache, we then need to free the request
back into the slab cache upon error (and not call kfree as that leads
to eventual memory corruption).

Fixes regression from
commit efab6d8d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Apr 7 16:20:57 2015 +0100

    drm/i915: Use a separate slab for requests
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9a0c1e27

drm/i915: Free RPS boosts for all laggards · e61b9958

由 Chris Wilson 提交于 4月 27, 2015

If the client stalls on a congested request, chosen to be 20ms old to
match throttling, allow the client a free RPS boost.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
[danvet: s/0/NULL/ reported by 0-day build]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e61b9958

21 5月, 2015 4 次提交

drm/i915: Convert RPS tracking to a intel_rps_client struct · 2e1b8730

由 Chris Wilson 提交于 4月 27, 2015

Now that we have internal clients, rather than faking a whole
drm_i915_file_private just for tracking RPS boosts, create a new struct
intel_rps_client and pass it along when waiting.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2e1b8730

drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts · a6f766f3

由 Chris Wilson 提交于 4月 27, 2015

Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a6f766f3

drm/i915: Implement inter-engine read-read optimisations · b4716185

由 Chris Wilson 提交于 4月 27, 2015

Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b4716185

drm/i915: s/\<rq\>/req/g · eed29a5b

由 Daniel Vetter 提交于 5月 21, 2015

The merged seqno->request conversion from John called request
variables req, but some (not all) of Chris' recent patches changed
those to just rq. We've had a lenghty (and inconclusive) discussion on
irc which is the more meaningful name with maybe at most a slight bias
towards req.

Given that the "don't change names without good reason to avoid
conflicts" rule applies, so lets go back to a req everywhere for
consistency. I'll sed any patches for which this will cause conflicts
before applying.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
[danvet: s/origina/merged/ as pointed out by Chris - the first
mass-conversion patch was from Chris, the merged one from John.]
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

eed29a5b

20 5月, 2015 2 次提交

drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu() · 2e2f351d

由 Chris Wilson 提交于 4月 27, 2015

We no longer interpolate domains in the same manner, and even if we did,
we should trust setting either of the other write domains would trigger
an invalidation rather than force it. Remove the tweaking of the
read_domains since it serves no purpose and use
i915_gem_object_wait_rendering() directly.

Note that this goes back to

commit a8198eea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 13 22:04:09 2011 +0100

    drm/i915: Introduce i915_gem_object_finish_gpu()

and gpu domain tracking died in

commit cc889e0f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which is more than 1 year older.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Add notes with information dug out of git history.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2e2f351d

drm/i915: Remove unused variable from i915_gem_mmap_gtt · 5e024f31

由 Daniel Vetter 提交于 5月 08, 2015

Lost in

commit c5ad54cf
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Wed May 6 14:36:09 2015 +0300

    drm/i915: Use partial view in mmap fault handler

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

5e024f31

08 5月, 2015 6 次提交

drm/i915: Reject huge tiled objects · e7ded2d7

由 Joonas Lahtinen 提交于 5月 08, 2015

We do not yet support tiled objects bigger than the mappable
aperture size so reject them.
Reported-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Rework the check a bit to avoid warnings.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e7ded2d7

drm/i915: Use partial view in mmap fault handler · c5ad54cf

由 Joonas Lahtinen 提交于 5月 06, 2015

Use partial view for huge BOs (bigger than half the mappable aperture)
in fault handler so that they can be accessed withough trying to make
room for them by evicting other objects.

v2:
- Only use partial views in the case where early rejection was
  previously done.
- Account variable type changes from previous reroll.
v3:
- Add a comment about overwriting existing page entries.
  (Tvrtko Ursulin)
- Whitespace fixes.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c5ad54cf

drm/i915: Consider object pinned if any VMA is pinned · a6631ae1

由 Joonas Lahtinen 提交于 5月 06, 2015

Do not skip special GGTT views when considering whether an object
is pinned or not.

Wrong behaviour was introduced in;

commit ec7adb6e
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Mon Mar 16 14:11:13 2015 +0200

    drm/i915: Do not use ggtt_view with (aliasing) PPGTT

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a6631ae1

drm/i915: Do not make assumptions on GGTT VMA sizes · 91e6711e

由 Joonas Lahtinen 提交于 5月 06, 2015

GGTT VMA sizes might be smaller than the whole object size due to
different GGTT views.

v2:
- Separate GGTT view constraint calculations from normal view
  constraint calculations (Chris Wilson)
v3:
- Do not bother with debug wording. (Tvrtko Ursulin)
v4:
- Clearer logic for calculating map_and_fenceable (Tvrtko Ursulin)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
[danvet: Drop BUG_ON, it's redudant.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

91e6711e

drm/i915: Remove locking for get-caching query · 432be69d

由 Chris Wilson 提交于 5月 07, 2015

Reading a single value from the object, the locking only provides futile
protection against userspace races. The locking is useless so remove it.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

432be69d

drm/i915: Clear vma->bound on unbinding · 5e562f1d

由 Mika Kuoppala 提交于 4月 30, 2015

Unbinding doesn't always lead to unconditional destruction
of vma. This destruction avoidance happens if vma is part of
execbuffer relocation list or if vma is being considered for
eviction in i915_gem_evict_something().

For those other users, mark the vma unbound so that
the correct state of this vma is preserved.
Reported-by: NChris Wilson <chris@chris-wilson.co.ok>
Cc: Chris Wilson <chris@chris-wilson.co.ok>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5e562f1d

24 4月, 2015 3 次提交

drm/i915: Workaround to avoid lite restore with HEAD==TAIL · 53292cdb

由 Michel Thierry 提交于 4月 15, 2015

WaIdleLiteRestore is an execlists-only workaround, and requires the driver
to ensure that any context always has HEAD!=TAIL when attempting lite
restore.

Add two extra MI_NOOP instructions at the end of each request, but keep
the requests tail pointing before the MI_NOOPs. We may not need to
executed them, and this is why request->tail is sampled before adding
these extra instructions.

If we submit a context to the ELSP which has previously been submitted,
move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.

v2: Move overallocation to gen8_emit_request, and added note about
sampling request->tail in commit message (Chris).

v3: Remove redundant request->tail assignment in __i915_add_request, in
lrc mode this is already set in execlists_context_queue.
Do not add wa implementation details inside gem (Chris).

v4: Apply the wa whenever the req has been resubmitted and update
comment (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

53292cdb

drm/i915: Fix up the vma aliasing ppgtt binding · 0875546c

由 Daniel Vetter 提交于 4月 20, 2015

Currently we have the problem that the decision whether ptes need to
be (re)written is splattered all over the codebase. Move all that into
i915_vma_bind. This needs a few changes:
- Just reuse the PIN_* flags for i915_vma_bind and do the conversion
  to vma->bound in there to avoid duplicating the conversion code all
  over.
- We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
  around) explicit, add PIN_USER for that.
- Two callers want to update ptes, give them a PIN_UPDATE for that.

Of course we still want to avoid double-binding, but that should be
taken care of:
- A ppgtt vma will only ever see PIN_USER, so no issue with
  double-binding.
- A ggtt vma with aliasing ppgtt needs both types of binding, and we
  track that properly now.
- A ggtt vma without aliasing ppgtt could be bound twice. In the
  lower-level ->bind_vma functions hence unconditionally set
  GLOBAL_BIND when writing the ggtt ptes.

There's still a bit room for cleanup, but that's for follow-up
patches.

v2: Fixup fumbles.

v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

0875546c

drm/i915: Remove misleading comment around bind_to_vm · cd102a68

由 Daniel Vetter 提交于 4月 14, 2015

It's true that we might need to context switch, but both the signalling
and implementation of the same are a few source files away. Remove it.
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cd102a68

20 4月, 2015 2 次提交

drm/i915: Move vma vfuns to adddress_space · 777dc5bb

由 Daniel Vetter 提交于 4月 14, 2015

They change with the address space and not with each vma, so move them
into the right pile of vfuncs. Save 2 pointers per vma and clarifies
the code.
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

777dc5bb

drm/i915: Simplify and fix object to display tracking · 8a0c39b1

由 Tvrtko Ursulin 提交于 4月 13, 2015

Purpose of this tracking is to know when to flush the cache between
the CPU and the non-coherent display engine. Prior to:

   commit 121920fa
   Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
   Date:   Mon Mar 23 11:10:37 2015 +0000

       drm/i915/skl: Query display address through a wrapper

This worked by a mix of direct flag manipulation and checking for
existence of a pinned GGTT VMA.

With the introduction of rotated display mappings this approach is
no longer correct.

New simpler approach is to just keep this count over calls which pin
and unpin objects to and from display, at the slight cost of extra
space in every bo.

(Inspired and extracted code from a larger rework by Chris Wilson.)

v2: Remove the limit since it is not well defined. (Chris Wilson, Ville Syrjälä)
v3: Commit message corrections. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8a0c39b1

16 4月, 2015 1 次提交

drm/i915: Fix view type in warning message · 5678ad73

由 Tvrtko Ursulin 提交于 3月 17, 2015

One month passed between posting a patch and it getting merged, and
unfortunately even though it still applies, it needs fixing to account
for changes in function parameters since:

   commit d385612e15b8b6eb3db328d83f1872ef8a381788
   Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
   Date:   Tue Mar 17 14:45:29 2015 +0000

       drm/i915: Log view type when printing warnings
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Squash in fixup from Tvrtko to fix the rebase conflict.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5678ad73

13 4月, 2015 2 次提交

drm/i915: Remove obj->pin_mappable · 30154650

由 Chris Wilson 提交于 4月 07, 2015

The obj->pin_mappable flag only exists for debug purposes and is a
hindrance that is mistreated with rotated GGTT views. For debug
purposes, it suffices to mark objects with pin_display as being of note.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

30154650

drm/i915: Optimistically spin for the request completion · 2def4ad9

由 Chris Wilson 提交于 4月 07, 2015

This provides a nice boost to mesa in swap bound scenarios (as mesa
throttles itself to the previous frame and given the scenario that will
complete shortly). It will also provide a good boost to systems running
with semaphores disabled and so frequently waiting on the GPU as it
switches rings. In the most favourable of microbenchmarks, this can
increase performance by around 15% - though in practice improvements
will be marginal and rarely noticeable.

v2: Account for user timeouts
v3: Limit the spinning to a single jiffie (~1us) at most. On an
otherwise idle system, there is no scheduler contention and so without a
limit we would spin until the GPU is ready.
v4: Drop forcewake - the lazy coherent access doesn't require it, and we
have no reason to believe that the forcewake itself improves seqno
coherency - it only adds delay.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2def4ad9

10 4月, 2015 12 次提交

drm/i915: Move vm page allocation in proper place · 1d335d1b

由 Mika Kuoppala 提交于 4月 10, 2015

Move to i915_vma_bind as it is part of the binding.
Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1d335d1b

drm/i915: Remove request->uniq · d7b9ca2f

由 Chris Wilson 提交于 4月 07, 2015

We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.

Fixes regression from
commit b3a38998
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Feb 19 16:30:47 2015 +0000

    drm/i915: Fix a use after free, and unbalanced refcounting

v2: Rebase
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
[danvet: Fixup because different merge order.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d7b9ca2f

C
drm/i915: Prefer to check for idleness in worker rather than sync-flush · 423795cb
由 Chris Wilson 提交于 4月 07, 2015
```
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
```
423795cb

drm/i915: Use a separate slab for vmas · e20d2ab7

由 Chris Wilson 提交于 4月 07, 2015

vma are more frequently allocated than objects and so should equally
benefit from having a dedicated slab.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e20d2ab7

drm/i915: Use a separate slab for requests · efab6d8d

由 Chris Wilson 提交于 4月 07, 2015

requests are even more frequently allocated than objects and equally
benefit from having a dedicated slab.

v2: Rebase
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

efab6d8d

drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists · 4bb1bedb

由 Chris Wilson 提交于 4月 07, 2015

When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.

Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.

To elaborate: The issue is that we emit the irq signalling sequence
before we grab the rpm reference, which means we could miss the
resulting interrupt (since that's not set up when suspended). The only
bad side effect is a missed interrupt, gt mmio writes automatically
wake up the hw itself. But otoh we have an umbrella rpm reference for
the entirety of execbuf, as long as that's there we're covered.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Explain a bit more about the add_request issue, which after
some irc chatting with Chris turns out to not be an issue really.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4bb1bedb

drm/i915: Split batch pool into size buckets · 8d9d5744

由 Chris Wilson 提交于 4月 07, 2015

Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.

v2: Add a comment about bucket sizes
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8d9d5744

drm/i915: Free batch pool when idle · 35c94185

由 Chris Wilson 提交于 4月 07, 2015

At runtime, this helps ensure that the batch pools are kept trim and
fast. Then at suspend, this releases memory that we do not need to
restore. It also ties into the oom-notifier to ensure that we recover as
much kernel memory as possible during OOM.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

35c94185

drm/i915: Split the batch pool by engine · 06fbca71

由 Chris Wilson 提交于 4月 07, 2015

I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

06fbca71

drm/i915: Re-enable RPS wait-boosting for all engines · 7c27f525

由 Chris Wilson 提交于 4月 07, 2015

This reverts commit ec5cc0f9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

The premise that media/blitter workloads are not affected by boosting is
patently false with a trip through igt. The question that remains is
what exactly is going wrong with the media workload that prompted this?
Hopefully that would be fixed by the missing agressive downclocking, in
addition to the extra restrictions imposed on how frequent a process is
allowed to boost.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll>
Acked-by: NDeepak S <deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7c27f525

drm/i915: Deminish contribution of wait-boosting from clients · 1854d5ca

由 Chris Wilson 提交于 4月 07, 2015

With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: NDeepak S <deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1854d5ca

drm/i915: Cache last obj->pages location for i915_gem_object_get_page() · ee286370

由 Chris Wilson 提交于 4月 07, 2015

The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.

v2: Avoid the double increment inside unlikely()

References: https://bugs.freedesktop.org/show_bug.cgi?id=88308Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ee286370

01 4月, 2015 2 次提交

drm/i915: Move common request allocation code into a common function · 6689cb2b

由 John Harrison 提交于 3月 19, 2015

The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6689cb2b

drm/i915: Rename 'do_execbuf' to 'execbuf_submit' · f3dc74c0

由 John Harrison 提交于 3月 19, 2015

The submission portion of the execbuffer code path was abstracted into a
function pointer indirection as part of the legacy vs execlist work. The two
implementation functions are called 'i915_gem_ringbuffer_submission' and
'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
indirection). The name of the pointer is therefore considered to be backwards
and should be changed.

This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f3dc74c0