提交 · 6fafab76d5189312bf45e71254b5aba5faf7a31b · openeuler / Kernel

18 3月, 2015 2 次提交

drm/i915: pass which operation triggered the frontbuffer tracking · a4001f1b

由 Paulo Zanoni 提交于 2月 13, 2015

We want to port FBC to the frontbuffer tracking infrastructure, but
for that we need to know what caused the object invalidation so
we can react accordingly: CPU mmaps need manual, GTT mmaps and
flips don't need handling and ring rendering needs nukes.

v2: - s/ORIGIN_RENDER/ORIGIN_CS/ (Daniel, Rodrigo)
    - Fix copy/pasted wrong documentation
    - Rebase
v3: - Rebase
v4: - Don't pass the operation to flushes (Daniel).
Signed-off-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a4001f1b

drm/i915: Fix trivial typos in comments and warning message · fd0753cf

由 Yannick Guerrini 提交于 2月 28, 2015

Change 'mutliple' to 'multiple'
Change 'mutlipler' to 'multiplier'
Change 'Haswel' to 'Haswell'
Signed-off-by: NYannick Guerrini <yguerrini@tomshardware.fr>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

fd0753cf

26 2月, 2015 1 次提交

drm/i915: Rename 'flags' to 'dispatch_flags' for better code reading · 8e004efc

由 John Harrison 提交于 2月 13, 2015

There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.

This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.

OTC-Jira: VIZ-1587
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8e004efc

24 2月, 2015 1 次提交

drm/i915: Trim the command parser allocations · 17cabf57

由 Chris Wilson 提交于 1月 14, 2015

Currently, the command parser tries to create a secondary batch exactly
as large as the original, and vmap both. This is open to abuse by
userspace using extremely large batch objects, but only executing very
short batches. For example, this would be if userspace were to implement
a command submission ringbuffer. However, we only need to allocate pages
for just the contents of the command sequence in the batch - all
relocations copied to the secondary batch will reference the original
batch and so there can be no access to the secondary batch outside of
the explicit execution region.

Testcase: igt/gem_exec_big #ivb,byt,hsw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88308Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJohn Harrison <John.C.Harrison@Intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

17cabf57

27 1月, 2015 1 次提交

drm/i915: Specify bsd rings through exec flag · 8d360dff

由 Zhipeng Gong 提交于 1月 13, 2015

On Skylake GT3 we have 2 Video Command Streamers (VCS), which is asymmetrical.
For example, HEVC GPU commands can be only dispatched to VCS1 ring.
But userspace has no control when using VCS1 or VCS2. This patch introduces
a mechanism to avoid the default ping-pong mode and use one specific ring
through execution flag. This mechanism is usable for all the platforms
with 2 VCS rings.

The open source usage is from these two commits in vaapi/intel:
	commit 702050f04131a44ef8ac16651708ce8a8d98e4b8
	Author: Zhao, Yakui <yakui.zhao@intel.com>
	Date:   Mon Nov 17 12:44:19 2014 +0800

	    Allow the batchbuffer to be submitted with override flag

	commit a56efcdf27d11ad9b21664b4a2cda72d7f90f5a8
	Author: Zhao Yakui <yakui.zhao@intel.com>
	Date:   Mon Nov 17 12:44:22 2014 +0800

	    Add the override flag to assure that HEVC video command
		always uses BSD ring0 for SKL GT3 machine

v2: fix whitespace (Rodrigo)
v3: remove incorrect chunk that came on -collector rebase. (Rodrigo)
v4: change the comment (Zhipeng)
v5: address Daniel's comment (Zhipeng)
Signed-off-by: NZhipeng Gong <zhipeng.gong@intel.com>
Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

8d360dff

08 1月, 2015 1 次提交

drm/i915: Reserve shadow batch VMA analogue to others · 7226572d

由 Tvrtko Ursulin 提交于 1月 07, 2015

If not pinned VMA can become an eviction target just before it needs to be
executed which breaks the internal object lifetime rules.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87399Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7226572d

24 12月, 2014 1 次提交

Revert "drm/gem: Warn on illegal use of the dumb buffer interface v2" · da6b51d0

由 Dave Airlie 提交于 12月 24, 2014

This reverts commit 355a7018.

This had some bad side effects under normal operation, and should
have been dropped earlier.
Signed-off-by: NDave Airlie <airlied@redhat.com>

da6b51d0

16 12月, 2014 4 次提交

drm/i915: Tidy up execbuffer command parsing code · 71745376

由 Brad Volkin 提交于 12月 11, 2014

Move it to a separate function since the main do_execbuffer function
already has so much going on.

v2:
- Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7
  feedback)

Issue: VIZ-4719
Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

71745376

drm/i915: Mark shadow batch buffers as purgeable · 0079a7df

由 Brad Volkin 提交于 12月 11, 2014

By adding a new exec_entry flag, we cleanly mark the shadow objects
as purgeable after they are on the active list.

v2:
- Move 'shadow_batch_obj->madv = I915_MADV_WILLNEED' inside _get
  fnc (danvet, from v4 6/7 feedback)

v3:
- Remove duplicate 'madv = I915_MADV_WILLNEED' (danvet, from v6 4/5)

Issue: VIZ-4719
Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

0079a7df

drm/i915: Use batch length instead of object size in command parser · b9ffd80e

由 Brad Volkin 提交于 12月 11, 2014

Previously we couldn't trust the user-supplied batch length because
it came directly from userspace (i.e. untrusted code). It would have
affected what commands software parsed without regard to what hardware
would actually execute, leaving a potential hole.

With the parser now copying the user supplied batch buffer and writing
MI_NOP commands to any space after the copied region, we can safely use
the batch length input. This should be a performance win as the actual
batch length is frequently much smaller than the allocated object size.

v2: Fix handling of non-zero batch_start_offset

Issue: VIZ-4719
Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b9ffd80e

drm/i915: Use batch pools with the command parser · 78a42377

由 Brad Volkin 提交于 12月 11, 2014

This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser and dispatches the copied
(shadow) batch to the hardware.

After this patch, the parser is in 'enabling' mode.

Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
   by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
   memset the rest of the buffer. Reduces the number of reads.

v2:
- Remove setting the capacity of the pool
- One global pool instead of per-ring pools
- Replace batch_obj with shadow_batch_obj and hook into eb->vmas
- Memset any space in the shadow batch beyond what gets copied
- Rebased on execlist prep refactoring

v3:
- Rebase on chained batch handling
- Squash in setting the secure dispatch flag
- Add a note about the interaction w/secure dispatch pinning
- Check for request->batch_obj == NULL in i915_gem_free_request

v4:
- Fix read domains for shadow_batch_obj
- Remove the set_to_gtt_domain call from i915_parse_cmds
- ggtt_pin/unpin in the parser block to simplify error handling
- Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
- Remove i915_gem_batch_pool_put calls

v5:
- Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after
  the parser (danvet, from v4 0/7 feedback)

Issue: VIZ-4719
Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
Reviewed-By: NJon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

78a42377

15 12月, 2014 1 次提交

drm/i915: Infrastructure for supporting different GGTT views per object · fe14d5f4

由 Tvrtko Ursulin 提交于 12月 10, 2014

Things like reliable GGTT mappings and mirrored 2d-on-3d display will need
to map objects into the same address space multiple times.

Added a GGTT view concept and linked it with the VMA to distinguish between
multiple instances per address space.

New objects and GEM functions which do not take this new view as a parameter
assume the default of zero (I915_GGTT_VIEW_NORMAL) which preserves the
previous behaviour.

This now means that objects can have multiple VMA entries so the code which
assumed there will only be one also had to be modified.

Alternative GGTT views are supposed to borrow DMA addresses from obj->pages
which is DMA mapped on first VMA instantiation and unmapped on the last one
going away.

v2:
    * Removed per view special casing in i915_gem_ggtt_prepare /
      finish_object in favour of creating and destroying DMA mappings
      on first VMA instantiation and last VMA destruction. (Daniel Vetter)
    * Simplified i915_vma_unbind which does not need to count the GGTT views.
      (Daniel Vetter)
    * Also moved obj->map_and_fenceable reset under the same check.
    * Checkpatch cleanups.

v3:
    * Only retire objects once the last VMA is unbound.

v4:
    * Keep scatter-gather table for alternative views persistent for the
      lifetime of the VMA.
    * Propagate binding errors to callers and handle appropriately.

v5:
    * Explicitly look for normal GGTT view in i915_gem_obj_bound to align
      usage in i915_gem_object_ggtt_unpin. (Michel Thierry)
    * Change to single if statement in i915_gem_obj_to_ggtt. (Michel Thierry)
    * Removed stray semi-colon in i915_gem_object_set_cache_level.

For: VIZ-4544
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
[danvet: Drop hunk from i915_gem_shrink since it's just prettification
but upsets a __must_check warning.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

fe14d5f4

03 12月, 2014 4 次提交

drm/i915: Convert trace functions from seqno to request · 74328ee5

由 John Harrison 提交于 11月 24, 2014

All the code above is now using requests not seqnos so it is possible to convert
the trace functions across. Note that rather than get into problematic reference
counting issues, the trace code only saves the seqno and ring values from the
request structure not the structure pointer itself.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

74328ee5

drm/i915: Remove obsolete seqno parameter from 'i915_add_request' · 9400ae5c

由 John Harrison 提交于 11月 24, 2014

There is no longer any need to retrieve a seqno value from an i915_add_request()
call. The calling code already knows which request structure is being processed
(it can only be ring->OLR). And as the request itself is now used in preference
to the basic seqno value, the latter is now redundant in this situation.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9400ae5c

drm/i915: Remove 'outstanding_lazy_seqno' · 6259cead

由 John Harrison 提交于 11月 24, 2014

The OLS value is now obsolete. Exactly the same value is guarateed to be always
available as PLR->seqno. Thus it is safe to remove the OLS completely. And also
to rename the PLR to OLR to keep the 'outstanding lazy ...' naming convention
valid.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6259cead

drm/i915: Replace last_[rwf]_seqno with last_[rwf]_req · 97b2a6a1

由 John Harrison 提交于 11月 24, 2014

The object structure contains the last read, write and fenced seqno values for
use in syncrhonisation operations. These have now been replaced with their
request structure counterparts.

Note that to ensure that objects do not end up with dangling pointers, the
assignments of last_*_req include reference count updates. Thus a request cannot
be freed if an object is still hanging on to it for any reason.

v2: Corrected 'last_rendering_' to 'last_read_' in a number of comments that did
not get updated when 'last_rendering_seqno' became 'last_read|write_seqno'
several millenia ago.

For: VIZ-4377
Signed-off-by: NJohn Harrison <John.C.Harrison@Intel.com>
Reviewed-by: NThomas Daniel <Thomas.Daniel@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

97b2a6a1

21 11月, 2014 1 次提交

drm/gem: Warn on illegal use of the dumb buffer interface v2 · 355a7018

由 Thomas Hellstrom 提交于 11月 20, 2014

It happens on occasion that developers of generic user-space applications
abuse the dumb buffer API to get hold of drm buffers that they can both
mmap() and use for GPU acceleration, using the assumptions that dumb buffers
and buffers available for GPU are
a) The same type and can be aribtrarily type-casted.
b) fully coherent.

This patch makes the most widely used drivers warn nicely when that happens,
the next step will be to fail.

v2: Move drmP.h changes to drm_gem.h. Fix Radeon dumb mmap breakage.
Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

355a7018

20 11月, 2014 2 次提交

drm/i915: Ditch dev_priv->ums.mm_suspend · 87255483

由 Daniel Vetter 提交于 11月 19, 2014

Again just complicates gem init functions and makes a general mess out
of everything.

Good riddance!

v2: In my enthusiasm to start removing dri1/ums crud I went overboard a
bit and killed parts of hangcheck. Resurrect it.
Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>

87255483

drm/i915: Remove DRI1 ring accessors and API · 5c6c6003

由 Chris Wilson 提交于 9月 06, 2014

With the deprecation of UMS, and by association DRI1, we have a tough
choice when updating the ring access routines. We either rewrite the
DRI1 routines blindly without testing (so likely to be broken) or take
the liberty of declaring them no longer supported and remove them
entirely. This takes the latter approach.

v2: Also remove the DRI1 sarea updates
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Fix rebase conflicts.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5c6c6003

08 11月, 2014 1 次提交

drm/i915: Request PIN_GLOBAL when pinning a vma for GTT relocations · c826c449

由 Chris Wilson 提交于 10月 31, 2014

Always require PIN_GLOBAL when we want a mappable offset (PIN_MAPPABLE).
This causes the pin to fixup the global binding in cases were the vma
was already bound (and due to the proceeding bug, we considered it to be
already mappable).

References: https://bugs.freedesktop.org/show_bug.cgi?id=85671Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Add WARN_ON to check that PIN_MAP implies PIN_GLOBAL as
discussed on irc.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

c826c449

04 11月, 2014 2 次提交

drm/i915: Abort command parsing for chained batches · 42c7156a

由 Brad Volkin 提交于 10月 16, 2014

libva uses chained batch buffers in a way that the command parser
can't generally handle. Fortunately, libva doesn't need to write
registers from batch buffers in the way that mesa does, so this
patch causes the driver to fall back to non-secure dispatch if
the parser detects a chained batch buffer.

Note: The 2nd hunk to munge the error code of the parser looks a bit
superflous. At least until we have the batch copy code ready and can
run the cmd parser in granting mode. But it isn't since we still need
to let existing libva buffers pass (though not with elevated privs
ofc!).

Testcase: igt/gem_exec_parse/chained-batch
Signed-off-by: NBrad Volkin <bradley.d.volkin@intel.com>
[danvet: Add note - this confused me in review and Brad clarified
things (after a few mails ...).]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

42c7156a

drm/i915: Move flags describing VMA mappings into the VMA · aff43766

由 Tvrtko Ursulin 提交于 10月 24, 2014

If these flags are on the object level it will be more difficult to allow
for multiple VMAs per object.

v2: Simplification and cleanup after code review comments (Chris Wilson).
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

aff43766

13 8月, 2014 1 次提交

drm/i915: Only track real ppgtt for a context · ae6c4806

由 Daniel Vetter 提交于 8月 06, 2014

There's a bit a confusion since we track the global gtt,
the aliasing and real ppgtt in the ctx->vm pointer. And not
all callers really bother to check for the different cases and just
presume that it points to a real ppgtt.

Now looking closely we don't actually need ->vm to always point at an
address space - the only place that cares actually has fixup code
already to decide whether to look at the per-proces or the global
address space.

So switch to just tracking the ppgtt directly and ditch all the
extraneous code.

v2: Fixup the ppgtt debugfs file to not oops on a NULL ctx->ppgtt.
Also drop the early exit - without aliasing ppgtt we want to dump all
the ppgtts of the contexts if we have full ppgtt.

v3: Actually git add the compile fix.
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Cc: "Thierry, Michel" <michel.thierry@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
OTC-Jira: VIZ-3724
[danvet: Resolve conflicts with execlist patches while applying.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ae6c4806

12 8月, 2014 1 次提交

drm/i915/bdw: Workload submission mechanism for Execlists · ba8b7ccb

由 Oscar Mateo 提交于 7月 24, 2014

This is what i915_gem_do_execbuffer calls when it wants to execute some
worload in an Execlists world.

v2: Check arguments before doing stuff in intel_execlists_submission. Also,
get rel_constants parsing right.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Drop the chipset flush, that's pre-gen6. And appease
checkpatch a bit .... again!]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ba8b7ccb

11 8月, 2014 8 次提交

drm/i915: Abstract the legacy workload submission mechanism away · a83014d3

由 Oscar Mateo 提交于 7月 24, 2014

As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a83014d3

drm/i915/bdw: Deferred creation of user-created LRCs · ec3e9963

由 Oscar Mateo 提交于 7月 24, 2014

The backing objects and ringbuffers for contexts created via open
fd are actually empty until the user starts sending execbuffers to
them. At that point, we allocate & populate them. We do this because,
at create time, we really don't know which engine is going to be used
with the context later on (and we don't want to waste memory on
objects that we might never use).

v2: As contexts created via ioctl can only be used with the render
ring, we have enough information to allocate & populate them right
away.

v3: Defer the creation always, even with ioctl-created contexts, as
requested by Daniel Vetter.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ec3e9963

drm/i915: Simplify relocate_entry_gtt() and make 64-bit safe · 906843c3

由 Chris Wilson 提交于 8月 10, 2014

Even though we should not try to use 4+GiB GTTs on 32-bit systems, by
using a local variable we can future proof the code whilst making it
easier to read.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
[danvet: Appease checkpatch a bit.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

906843c3

drm/i915: Remove redundant list_empty(eb->vmas) tests in execbuffer · 060e82c6

由 Chris Wilson 提交于 8月 10, 2014

Part of the pre-validation for an execbuffer call is that there is at
least one object in the execlist. As we bail if we fail to lookup any
object, we can be sure that after the eb_lookup_vma() there is at least
one object in the vma list and so we do not need to assert.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

060e82c6

drm/i915: Pre-validate the NEED_GTTS flag for execbuffer · ad19f10b

由 Chris Wilson 提交于 8月 10, 2014

We have an implementation requirement that precludes the user from
requesting a ggtt entry when the device is operating in ppgtt mode. Move
the current check from inside the execbuffer object collation to the
prevalidation phase.

v2: Roll both invalid flags checks into one
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ad19f10b

drm/i915: Fix secure dispatch with full ppgtt · da51a1e7

由 Daniel Vetter 提交于 8月 11, 2014

Based upon a hunk from a patch from Chris Wilson, but augmented to:
- Process the batch in the full ppgtt vm so that self-relocations
  match again with userspace's expectations..
- Add a comment why plain pin for the global gtt binding is safe at
  that point.

v2: Drop local bind_vm variable (Chris).

v3: Explain why this works despite the lack of proper active tracking
for the ggtt batch vma.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

da51a1e7

drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access · 82b6b6d7

由 Chris Wilson 提交于 8月 09, 2014

This migrates the fence tracking onto the existing seqno
infrastructure so that the later conversion to tracking via requests is
simplified.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

82b6b6d7

drm/i915: Force CPU relocations if not GTT mapped · e6a84468

由 Chris Wilson 提交于 8月 11, 2014

Move the decision on whether we need to have a mappable object during
execbuffer to the fore and then reuse that decision by propagating the
flag through to reservation. As a corollary, before doing the actual
relocation through the GTT, we can make sure that we do have a GTT
mapping through which to operate.

Note that the key to make this work is to ditch the
obj->map_and_fenceable unbind optimization - with full ppgtt it
doesn't make a lot of sense any more anyway.

v2: Revamp and resend to ease future patches.
v3: Refresh patch rationale

References: https://bugs.freedesktop.org/show_bug.cgi?id=81094Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Explain why obj->map_and_fenceable is key and split out the
secure batch fix.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e6a84468

08 8月, 2014 1 次提交

Revert "drm: drop redundant drm_file->is_master" · 7963e9db

由 Dave Airlie 提交于 8月 08, 2014

This reverts commit 48ba8137.

Thanks to Chris:
"drm_file->is_master is not synomous with having drm_file->master ==
drm_file->minor->master. This is because drm_file->master is the same
for all drm_files of the same generation and so when there is a master,
every drm_file believes itself to be the master. Confusion ensues and
things go pear shaped when one file is closed and there is no master
anymore."

Conflicts:
	drivers/gpu/drm/drm_drv.c
	drivers/gpu/drm/drm_stub.c

7963e9db

05 8月, 2014 1 次提交

drm: drop redundant drm_file->is_master · 48ba8137

由 David Herrmann 提交于 7月 22, 2014

The drm_file->is_master field is redundant as it's equivalent to:
    drm_file->master && drm_file->master == drm_file->minor->master

1) "=>"
  Whenever we set drm_file->is_master, we also set:
      drm_file->minor->master = drm_file->master;

  Whenever we clear drm_file->is_master, we also call:
      drm_master_put(&drm_file->minor->master);
  which implicitly clears it to NULL.

2) "<="
  minor->master cannot be set if it is non-NULL. Therefore, it stays as
  is unless a file drops it.

  If minor->master is NULL, it is only set by places that also adjust
  drm_file->is_master.

Therefore, we can safely drop is_master and replace it by an inline helper
that matches:
    drm_file->master && drm_file->master == drm_file->minor->master
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>

48ba8137

08 7月, 2014 2 次提交

drm/i915: Extract the actual workload submission mechanism from execbuffer · 78382593

由 Oscar Mateo 提交于 7月 03, 2014

So that we isolate the legacy ringbuffer submission mechanism, which becomes
a good candidate to be abstracted away. This is prep-work for Execlists (which
will its own workload submission mechanism).

No functional changes.
Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

78382593

drm/i915: Emphasize that ctx->id is merely a user handle · 821d66dd

由 Oscar Mateo 提交于 7月 03, 2014

This is an Execlists preparatory patch, since they make context ID become an
overloaded term:

- In the software, it was used to distinguish which context userspace was
  trying to use.
- In the BSpec, the term is used to describe the 20-bits long field the
  hardware uses to it to discriminate the contexts that are submitted to
  the ELSP and inform the driver about their current status (via Context
  Switch Interrupts and Context Status Buffers).

Initially, I tried to make the different meanings converge, but it proved
impossible:

- The software ctx->id is per-filp, while the hardware one needs to be
  globally unique.
- Also, we multiplex several backing states objects per intel_context,
  and all of them need unique HW IDs.
- I tried adding a per-filp ID and then composing the HW context ID as:
  ctx->id + file_priv->id + ring->id, but the fact that the hardware only
  uses 20-bits means we have to artificially limit the number of filps or
  contexts the userspace can create.

The ctx->user_handle renaming bits are done with this Cocci patch (plus
manual frobbing of the struct declaration):

    @@
    struct intel_context c;
    @@
    - (c).id
    + c.user_handle

    @@
    struct intel_context *c;
    @@
    - (c)->id
    + c->user_handle

Also, while we are at it, s/DEFAULT_CONTEXT_ID/DEFAULT_CONTEXT_HANDLE and
change the type to unsigned 32 bits.

v2: s/handle/user_handle and change the type to uint32_t as suggested by
Chris Wilson.

Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v1)
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

821d66dd

20 6月, 2014 1 次提交

drm/i915: Track frontbuffer invalidation/flushing · f99d7069

由 Daniel Vetter 提交于 6月 19, 2014

So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.

Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.

But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.

To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).

Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.

v2: Lots of improvements

Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
  from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
  Suggested by Chris.

Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.

v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.

v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.

v5: Fixup locking around the fbcon set_to_gtt_domain call.

v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
  functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
  object. We already have precedence for fb_obj in the pin_and_fence
  functions.

v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
  directly. These functions center on the plane, the actual object is
  irrelevant - even a flip to the same object as already active should
  cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
  currently just calls intel_frontbuffer_flush since the implemenation
  differs.

This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.

Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.

v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.

v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.

v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
  in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
  still has work left to do before it's fully generic.

v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.

v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f99d7069

13 6月, 2014 1 次提交

drm/i915: Fix __user sparse warning · d593d992

由 Ville Syrjälä 提交于 6月 13, 2014

CHECK linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: warning: incorrect type in initializer (different address spaces)
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: expected struct drm_i915_gem_exec_object2 *user_exec_list
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1529:47: got void [noderef] <asn:1>*
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: warning: incorrect type in argument 1 (different address spaces)
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: expected void [noderef] <asn:1>*dst
linux/drivers/gpu/drm/i915/i915_gem_execbuffer.c:1533:61: got unsigned long long *<noident>
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d593d992

27 5月, 2014 2 次提交

drm/i915: Prevent negative relocation deltas from wrapping · d23db88c

由 Chris Wilson 提交于 5月 23, 2014

This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.

So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.

This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.

v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.

v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
  were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
  observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.

v5: Add static to eb_get_batch, spotted by 0-day tester.

Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d23db88c

drm/i915: Only copy back the modified fields to userspace from execbuffer · 9aab8bff

由 Chris Wilson 提交于 5月 23, 2014

We only want to modifiy a single field in the userspace view of the
execbuffer command buffer, so explicitly change that rather than copy
everything back again.

This serves two purposes:

1. The single fields are much cheaper to copy (constant size so the
copy uses special case code) and much smaller than the whole array.

2. We modify the array for internal use that need to be masked from
the user.

Note: We need this backported since without it the next bugfix will
blow up when userspace recycles batchbuffers and relocations.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9aab8bff

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功