提交 · a6f766f3975185af66a31a2cea2cd38721645999 · openeuler / raspberrypi-kernel

21 5月, 2015 3 次提交

drm/i915: Limit ring synchronisation (sw sempahores) RPS boosts · a6f766f3

由 Chris Wilson 提交于 4月 27, 2015

Ring switches can occur many times per frame, and are often out of
control, causing frequent RPS boosting for no practical benefit. Treat
the sw semaphore synchronisation as a separate client and only allow it
to boost once per busy/idle cycle.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: s/rq/req/]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a6f766f3

drm/i915: Implement inter-engine read-read optimisations · b4716185

由 Chris Wilson 提交于 4月 27, 2015

Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).

v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.

Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k:		275.794µs
After:  Time to read-read 1024k:		123.260µs

hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k:		230.433µs
After:  Time to read-read 1024k:		124.593µs

bdw-u (w/o semaphores):             Before          After
Time to read-read 1x1:            26.274µs       10.350µs
Time to read-read 128x128:        40.097µs       21.366µs
Time to read-read 256x256:        77.087µs       42.608µs
Time to read-read 512x512:       281.999µs      181.155µs
Time to read-read 1024x1024:    1196.141µs     1118.223µs
Time to read-read 2048x2048:    5639.072µs     5225.837µs
Time to read-read 4096x4096:   22401.662µs    21137.067µs
Time to read-read 8192x8192:   89617.735µs    85637.681µs

Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b4716185

drm/i915: s/\<rq\>/req/g · eed29a5b

由 Daniel Vetter 提交于 5月 21, 2015

The merged seqno->request conversion from John called request
variables req, but some (not all) of Chris' recent patches changed
those to just rq. We've had a lenghty (and inconclusive) discussion on
irc which is the more meaningful name with maybe at most a slight bias
towards req.

Given that the "don't change names without good reason to avoid
conflicts" rule applies, so lets go back to a req everywhere for
consistency. I'll sed any patches for which this will cause conflicts
before applying.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
[danvet: s/origina/merged/ as pointed out by Chris - the first
mass-conversion patch was from Chris, the merged one from John.]
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

eed29a5b

20 5月, 2015 2 次提交

drm/i915: Remove domain flubbing from i915_gem_object_finish_gpu() · 2e2f351d

由 Chris Wilson 提交于 4月 27, 2015

We no longer interpolate domains in the same manner, and even if we did,
we should trust setting either of the other write domains would trigger
an invalidation rather than force it. Remove the tweaking of the
read_domains since it serves no purpose and use
i915_gem_object_wait_rendering() directly.

Note that this goes back to

commit a8198eea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 13 22:04:09 2011 +0100

    drm/i915: Introduce i915_gem_object_finish_gpu()

and gpu domain tracking died in

commit cc889e0f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_list

which is more than 1 year older.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Add notes with information dug out of git history.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2e2f351d

drm/i915: Remove unused variable from i915_gem_mmap_gtt · 5e024f31

由 Daniel Vetter 提交于 5月 08, 2015

Lost in

commit c5ad54cf
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Wed May 6 14:36:09 2015 +0300

    drm/i915: Use partial view in mmap fault handler

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

5e024f31

08 5月, 2015 6 次提交

drm/i915: Reject huge tiled objects · e7ded2d7

由 Joonas Lahtinen 提交于 5月 08, 2015

We do not yet support tiled objects bigger than the mappable
aperture size so reject them.
Reported-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Rework the check a bit to avoid warnings.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e7ded2d7

drm/i915: Use partial view in mmap fault handler · c5ad54cf

由 Joonas Lahtinen 提交于 5月 06, 2015

Use partial view for huge BOs (bigger than half the mappable aperture)
in fault handler so that they can be accessed withough trying to make
room for them by evicting other objects.

v2:
- Only use partial views in the case where early rejection was
  previously done.
- Account variable type changes from previous reroll.
v3:
- Add a comment about overwriting existing page entries.
  (Tvrtko Ursulin)
- Whitespace fixes.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c5ad54cf

drm/i915: Consider object pinned if any VMA is pinned · a6631ae1

由 Joonas Lahtinen 提交于 5月 06, 2015

Do not skip special GGTT views when considering whether an object
is pinned or not.

Wrong behaviour was introduced in;

commit ec7adb6e
Author: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Date:   Mon Mar 16 14:11:13 2015 +0200

    drm/i915: Do not use ggtt_view with (aliasing) PPGTT

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a6631ae1

drm/i915: Do not make assumptions on GGTT VMA sizes · 91e6711e

由 Joonas Lahtinen 提交于 5月 06, 2015

GGTT VMA sizes might be smaller than the whole object size due to
different GGTT views.

v2:
- Separate GGTT view constraint calculations from normal view
  constraint calculations (Chris Wilson)
v3:
- Do not bother with debug wording. (Tvrtko Ursulin)
v4:
- Clearer logic for calculating map_and_fenceable (Tvrtko Ursulin)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
[danvet: Drop BUG_ON, it's redudant.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

91e6711e

drm/i915: Remove locking for get-caching query · 432be69d

由 Chris Wilson 提交于 5月 07, 2015

Reading a single value from the object, the locking only provides futile
protection against userspace races. The locking is useless so remove it.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

432be69d

drm/i915: Clear vma->bound on unbinding · 5e562f1d

由 Mika Kuoppala 提交于 4月 30, 2015

Unbinding doesn't always lead to unconditional destruction
of vma. This destruction avoidance happens if vma is part of
execbuffer relocation list or if vma is being considered for
eviction in i915_gem_evict_something().

For those other users, mark the vma unbound so that
the correct state of this vma is preserved.
Reported-by: NChris Wilson <chris@chris-wilson.co.ok>
Cc: Chris Wilson <chris@chris-wilson.co.ok>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5e562f1d

24 4月, 2015 3 次提交

drm/i915: Workaround to avoid lite restore with HEAD==TAIL · 53292cdb

由 Michel Thierry 提交于 4月 15, 2015

WaIdleLiteRestore is an execlists-only workaround, and requires the driver
to ensure that any context always has HEAD!=TAIL when attempting lite
restore.

Add two extra MI_NOOP instructions at the end of each request, but keep
the requests tail pointing before the MI_NOOPs. We may not need to
executed them, and this is why request->tail is sampled before adding
these extra instructions.

If we submit a context to the ELSP which has previously been submitted,
move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.

v2: Move overallocation to gen8_emit_request, and added note about
sampling request->tail in commit message (Chris).

v3: Remove redundant request->tail assignment in __i915_add_request, in
lrc mode this is already set in execlists_context_queue.
Do not add wa implementation details inside gem (Chris).

v4: Apply the wa whenever the req has been resubmitted and update
comment (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

53292cdb

drm/i915: Fix up the vma aliasing ppgtt binding · 0875546c

由 Daniel Vetter 提交于 4月 20, 2015

Currently we have the problem that the decision whether ptes need to
be (re)written is splattered all over the codebase. Move all that into
i915_vma_bind. This needs a few changes:
- Just reuse the PIN_* flags for i915_vma_bind and do the conversion
  to vma->bound in there to avoid duplicating the conversion code all
  over.
- We need to make binding for EXECBUF (i.e. pick aliasing ppgtt if
  around) explicit, add PIN_USER for that.
- Two callers want to update ptes, give them a PIN_UPDATE for that.

Of course we still want to avoid double-binding, but that should be
taken care of:
- A ppgtt vma will only ever see PIN_USER, so no issue with
  double-binding.
- A ggtt vma with aliasing ppgtt needs both types of binding, and we
  track that properly now.
- A ggtt vma without aliasing ppgtt could be bound twice. In the
  lower-level ->bind_vma functions hence unconditionally set
  GLOBAL_BIND when writing the ggtt ptes.

There's still a bit room for cleanup, but that's for follow-up
patches.

v2: Fixup fumbles.

v3: s/PIN_EXECBUF/PIN_USER/ for clearer meaning, suggested by Chris.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

0875546c

drm/i915: Remove misleading comment around bind_to_vm · cd102a68

由 Daniel Vetter 提交于 4月 14, 2015

It's true that we might need to context switch, but both the signalling
and implementation of the same are a few source files away. Remove it.
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cd102a68

20 4月, 2015 2 次提交

drm/i915: Move vma vfuns to adddress_space · 777dc5bb

由 Daniel Vetter 提交于 4月 14, 2015

They change with the address space and not with each vma, so move them
into the right pile of vfuncs. Save 2 pointers per vma and clarifies
the code.
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

777dc5bb

drm/i915: Simplify and fix object to display tracking · 8a0c39b1

由 Tvrtko Ursulin 提交于 4月 13, 2015

Purpose of this tracking is to know when to flush the cache between
the CPU and the non-coherent display engine. Prior to:

   commit 121920fa
   Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
   Date:   Mon Mar 23 11:10:37 2015 +0000

       drm/i915/skl: Query display address through a wrapper

This worked by a mix of direct flag manipulation and checking for
existence of a pinned GGTT VMA.

With the introduction of rotated display mappings this approach is
no longer correct.

New simpler approach is to just keep this count over calls which pin
and unpin objects to and from display, at the slight cost of extra
space in every bo.

(Inspired and extracted code from a larger rework by Chris Wilson.)

v2: Remove the limit since it is not well defined. (Chris Wilson, Ville Syrjälä)
v3: Commit message corrections. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8a0c39b1

16 4月, 2015 1 次提交

drm/i915: Fix view type in warning message · 5678ad73

由 Tvrtko Ursulin 提交于 3月 17, 2015

One month passed between posting a patch and it getting merged, and
unfortunately even though it still applies, it needs fixing to account
for changes in function parameters since:

   commit d385612e15b8b6eb3db328d83f1872ef8a381788
   Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
   Date:   Tue Mar 17 14:45:29 2015 +0000

       drm/i915: Log view type when printing warnings
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
[danvet: Squash in fixup from Tvrtko to fix the rebase conflict.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5678ad73

13 4月, 2015 2 次提交

drm/i915: Remove obj->pin_mappable · 30154650

由 Chris Wilson 提交于 4月 07, 2015

The obj->pin_mappable flag only exists for debug purposes and is a
hindrance that is mistreated with rotated GGTT views. For debug
purposes, it suffices to mark objects with pin_display as being of note.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

30154650

drm/i915: Optimistically spin for the request completion · 2def4ad9

由 Chris Wilson 提交于 4月 07, 2015

This provides a nice boost to mesa in swap bound scenarios (as mesa
throttles itself to the previous frame and given the scenario that will
complete shortly). It will also provide a good boost to systems running
with semaphores disabled and so frequently waiting on the GPU as it
switches rings. In the most favourable of microbenchmarks, this can
increase performance by around 15% - though in practice improvements
will be marginal and rarely noticeable.

v2: Account for user timeouts
v3: Limit the spinning to a single jiffie (~1us) at most. On an
otherwise idle system, there is no scheduler contention and so without a
limit we would spin until the GPU is ready.
v4: Drop forcewake - the lazy coherent access doesn't require it, and we
have no reason to believe that the forcewake itself improves seqno
coherency - it only adds delay.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: "Rantala, Valtteri" <valtteri.rantala@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

2def4ad9

10 4月, 2015 12 次提交

drm/i915: Move vm page allocation in proper place · 1d335d1b

由 Mika Kuoppala 提交于 4月 10, 2015

Move to i915_vma_bind as it is part of the binding.
Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1d335d1b

drm/i915: Remove request->uniq · d7b9ca2f

由 Chris Wilson 提交于 4月 07, 2015

We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.

Fixes regression from
commit b3a38998
Author: Nick Hoath <nicholas.hoath@intel.com>
Date:   Thu Feb 19 16:30:47 2015 +0000

    drm/i915: Fix a use after free, and unbalanced refcounting

v2: Rebase
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
[danvet: Fixup because different merge order.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d7b9ca2f

C
drm/i915: Prefer to check for idleness in worker rather than sync-flush · 423795cb
由 Chris Wilson 提交于 4月 07, 2015
```
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
```
423795cb

drm/i915: Use a separate slab for vmas · e20d2ab7

由 Chris Wilson 提交于 4月 07, 2015

vma are more frequently allocated than objects and so should equally
benefit from having a dedicated slab.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e20d2ab7

drm/i915: Use a separate slab for requests · efab6d8d

由 Chris Wilson 提交于 4月 07, 2015

requests are even more frequently allocated than objects and equally
benefit from having a dedicated slab.

v2: Rebase
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

efab6d8d

drm/i915: Use the global runtime-pm wakelock for a busy GPU for execlists · 4bb1bedb

由 Chris Wilson 提交于 4月 07, 2015

When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.

Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.

To elaborate: The issue is that we emit the irq signalling sequence
before we grab the rpm reference, which means we could miss the
resulting interrupt (since that's not set up when suspended). The only
bad side effect is a missed interrupt, gt mmio writes automatically
wake up the hw itself. But otoh we have an umbrella rpm reference for
the entirety of execbuf, as long as that's there we're covered.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Explain a bit more about the add_request issue, which after
some irc chatting with Chris turns out to not be an issue really.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

4bb1bedb

drm/i915: Split batch pool into size buckets · 8d9d5744

由 Chris Wilson 提交于 4月 07, 2015

Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.

v2: Add a comment about bucket sizes
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8d9d5744

drm/i915: Free batch pool when idle · 35c94185

由 Chris Wilson 提交于 4月 07, 2015

At runtime, this helps ensure that the batch pools are kept trim and
fast. Then at suspend, this releases memory that we do not need to
restore. It also ties into the oom-notifier to ensure that we recover as
much kernel memory as possible during OOM.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

35c94185

drm/i915: Split the batch pool by engine · 06fbca71

由 Chris Wilson 提交于 4月 07, 2015

I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.

One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.

v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

06fbca71

drm/i915: Re-enable RPS wait-boosting for all engines · 7c27f525

由 Chris Wilson 提交于 4月 07, 2015

This reverts commit ec5cc0f9
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Jun 12 10:28:55 2014 +0100

    drm/i915: Restrict GPU boost to the RCS engine

The premise that media/blitter workloads are not affected by boosting is
patently false with a trip through igt. The question that remains is
what exactly is going wrong with the media workload that prompted this?
Hopefully that would be fixed by the missing agressive downclocking, in
addition to the extra restrictions imposed on how frequent a process is
allowed to boost.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll>
Acked-by: NDeepak S <deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7c27f525

drm/i915: Deminish contribution of wait-boosting from clients · 1854d5ca

由 Chris Wilson 提交于 4月 07, 2015

With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: NDeepak S <deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1854d5ca

drm/i915: Cache last obj->pages location for i915_gem_object_get_page() · ee286370

由 Chris Wilson 提交于 4月 07, 2015

The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.

v2: Avoid the double increment inside unlikely()

References: https://bugs.freedesktop.org/show_bug.cgi?id=88308Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ee286370

01 4月, 2015 2 次提交

drm/i915: Move common request allocation code into a common function · 6689cb2b

由 John Harrison 提交于 3月 19, 2015

The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.

This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.

For: VIZ-5190
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6689cb2b

drm/i915: Rename 'do_execbuf' to 'execbuf_submit' · f3dc74c0

由 John Harrison 提交于 3月 19, 2015

The submission portion of the execbuffer code path was abstracted into a
function pointer indirection as part of the legacy vs execlist work. The two
implementation functions are called 'i915_gem_ringbuffer_submission' and
'intel_execlists_submission' but the pointer was called 'do_execbuf'. There is
already a 'i915_gem_do_execbuffer' function (which is what calls the pointer
indirection). The name of the pointer is therefore considered to be backwards
and should be changed.

This patch renames it to 'execbuf_submit' which is hopefully a bit clearer.

For: VIZ-5115
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NTomas Elf <tomas.elf@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f3dc74c0

30 3月, 2015 1 次提交

drm/i915: Add i915_gem_request_unreference__unlocked · 41037f9f

由 Chris Wilson 提交于 3月 27, 2015

We were missing a convenience stub to aquire the right mutex whilst
dropping the request, so add it.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

41037f9f

27 3月, 2015 2 次提交

drm/i915: Compare GGTT view structs instead of types · 9abc4648

由 Joonas Lahtinen 提交于 3月 27, 2015

To allow for views where the view type is not defined by the view type only,
like it is in stereo or rotated 90 degree view, change the semantic to require
the whole view structure for comparison when we match a GGTT view.

This allows including parameters like offset to be included in the view which
is useful for eg. partial views.

v3:
- Rely on ggtt_view type being 0 for non-GGTT vma's, which equals to
  I915_GGTT_VIEW_NORMAL. (Daniel Vetter)
- Do not use potentially slower comparison when we only want to know if
  something is or is not a normal view.
- Rebase on top of rotated view patches. Add rotated view singleton.
- If one view is missing in comparison they're equal only if both are missing.

v4:
- Use comparison helper in obj_to_ggtt_view too. (Tvrtko Ursulin)
- Do WARN_ON if one view is NULL. (Tvrtko Ursulin)

Cc: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9abc4648

drm/i915: Add dynamic page trace events · 72744cb1

由 Michel Thierry 提交于 3月 24, 2015

Traces for page directories and tables allocation and map.

v2: Removed references to teardown.
v3: bitmap_scnprintf has been deprecated.
v4: Replace bitmap_scnprintf with scnprintf correctly, and get right
range lengths. (Mika)

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

72744cb1

26 3月, 2015 1 次提交

drm/i915: Keep ring->active_list and ring->requests_list consistent · 832a3aad

由 Chris Wilson 提交于 3月 18, 2015

If we retire requests last, we may use a later seqno and so clear
the requests lists without clearing the active list, leading to
confusion. Hence we should retire requests first for consistency with
the early return. The order used to be important as the lifecycle for
the object on the active list was determined by request->seqno. However,
the requests themselves are now reference counted removing the
constraint from the order of retirement.

Fixes regression from

commit 1b5a433a
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Mon Nov 24 18:49:42 2014 +0000

    drm/i915: Convert 'i915_seqno_passed' calls into 'i915_gem_request_completed
'

and a

	WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
	WARN_ON(!list_empty(&vm->active_list))

Identified by updating WATCH_LISTS:

	[drm:i915_verify_lists] *ERROR* blitter ring: active list not empty, but no requests
	WARNING: CPU: 0 PID: 681 at drivers/gpu/drm/i915/i915_gem.c:2751 i915_gem_retire_requests_ring+0x149/0x230()
	WARN_ON(i915_verify_lists(ring->dev))

Note that this is only a problem in evict_vm where the following happens
after a retire_request has cleaned out all requests, but not all active
bo:
- intel_ring_idle called from i915_gpu_idle notices that no requests are
  outstanding and immediately returns.
- i915_gem_retire_requests_ring called from i915_gem_retire_requests also
  immediately returns when there's no request, still leaving the bo on the
  active list.
- evict_vm hits the WARN_ON(!list_empty(&vm->active_list)) after evicting
  all active objects that there's still stuff left that shouldn't be
  there.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJani Nikula <jani.nikula@intel.com>

832a3aad

23 3月, 2015 2 次提交

drm/i915/skl: Support secondary (rotated) frame buffer mapping · 50470bb0

由 Tvrtko Ursulin 提交于 3月 23, 2015

90/270 rotated scanout needs a rotated GTT view of the framebuffer.

This is put in a separate VMA with a dedicated ggtt view and wired such that
it is created when a framebuffer is pinned to a 90/270 rotated plane.

Rotation is only possible with Yb/Yf buffers and error is propagated to
user space in case of a mismatch.

Special rotated page view is constructed at the VMA creation time by
borrowing the DMA addresses from obj->pages.

v2:
    * Do not bother with pages for rotated sg list, just populate the DMA
      addresses. (Daniel Vetter)
    * Checkpatch cleanup.

v3:
    * Rebased on top of new plane handling (create rotated mapping when
      setting the rotation property).
    * Unpin rotated VMA on unpinning from display plane.
    * Simplify rotation check using bitwise AND. (Chris Wilson)

v4:
    * Fix unpinning of optional rotated mapping so it is really considered
      to be optional.

v5:
   * Rebased for fb modifier changes.
   * Rebased for atomic commit.
   * Only pin needed view for display. (Ville Syrjälä, Daniel Vetter)

v6:
   * Rebased after preparatory work has been extracted out. (Daniel Vetter)

v7:
   * Slightly simplified tiling geometry calculation.
   * Moved rotated GGTT view implementation into i915_gem_gtt.c (Daniel Vetter)

v8:
   * Do not use i915_gem_obj_size to get object size since that actually
     returns the size of an VMA which may not exist.
   * Rebased for ggtt view changes.

v9:
   * Rebased after code review changes on the preceding patches.
   * Tidy function definitions. (Joonas Lahtinen)

For: VIZ-4726
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com> (v4)
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

50470bb0

drm/i915: Use GGTT view when (un)pinning objects to planes · e6617330

由 Tvrtko Ursulin 提交于 3月 23, 2015

To support frame buffer rotation we need to be able to pass on the information
on what kind of GGTT view is required for display.

This patch just adds the parameter and makes all the callers default to the
normal view.

v2: Rebased for ggtt view changes.
v3: Don't limit PIN_MAPPABLE to normal views just yet. (Joonas Lahtinen)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (v3)
[danvet: s/BUG/WARN/ in the patch hunk because. At least where the
BUG_ON isn't fatal right away.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e6617330

20 3月, 2015 1 次提交

drm/i915: Track page table reload need · 563222a7

由 Ben Widawsky 提交于 3月 19, 2015

This patch was formerly known as, "Force pd restore when PDEs change,
gen6-7." I had to change the name because it is needed for GEN8 too.

The real issue this is trying to solve is when a new object is mapped
into the current address space. The GPU does not snoop the new mapping
so we must do the gen specific action to reload the page tables.

GEN8 and GEN7 do differ in the way they load page tables for the RCS.
GEN8 does so with the context restore, while GEN7 requires the proper
load commands in the command streamer. Non-render is similar for both.

Caveat for GEN7
The docs say you cannot change the PDEs of a currently running context.
We never map new PDEs of a running context, and expect them to be
present - so I think this is okay. (We can unmap, but this should also
be okay since we only unmap unreferenced objects that the GPU shouldn't
be tryingto va->pa xlate.) The MI_SET_CONTEXT command does have a flag
to signal that even if the context is the same, force a reload. It's
unclear exactly what this does, but I have a hunch it's the right thing
to do.

The logic assumes that we always emit a context switch after mapping new
PDEs, and before we submit a batch. This is the case today, and has been
the case since the inception of hardware contexts. A note in the comment
let's the user know.

It's not just for gen8. If the current context has mappings change, we
need a context reload to switch

v2: Rebased after ppgtt clean up patches. Split the warning for aliasing
and true ppgtt options. And do not break aliasing ppgtt, where to->ppgtt
is always null.

v3: Invalidate PPGTT TLBs inside alloc_va_range.

v4: Rename ppgtt_invalidate_tlbs to mark_tlbs_dirty and move
pd_dirty_rings from i915_address_space to i915_hw_ppgtt. Fixes when
neither ctx->ppgtt and aliasing_ppgtt exist.

v5: Removed references to teardown_va_range.

v6: Updated needs_pd_load_pre/post.

v7: Fix pd_dirty_rings check in needs_pd_load_post, and update/move
comment about updated PDEs to object_pin/bind (Mika).

Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

563222a7