提交 · 81e7f2002b7db269799ebdac0d905574c0a85d1d · openanolis / cloud-kernel

03 9月, 2014 3 次提交

drm/i915: Idle unused rings on gen2/3 during init/resume · 81e7f200

由 Ville Syrjälä 提交于 8月 15, 2014

gen2/3 platforms have a boatload of rings we're not using. On my 830
the BIOS/hw can leave some of those "active" after resume which will
prevent c3 entry. The ring is apparently considered active whenever
head != tail even if the ring is disabled.

Disable and clear all such unused ringbuffers on init/resume.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

81e7f200

drm/i915/bdw: Don't execute context reset and switch with Execlists · ecdb5fd8

由 Thomas Daniel 提交于 8月 20, 2014

These two functions make no sense in an Logical Ring Context & Execlists
world.

v2: We got rid of lrc_enabled and centralized everything in the sanitized
i915.enable_execlists instead.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>

v3: Rebased.  Corrected a typo in comment for i915_switch_context and
added a comment that it should not be called in execlist mode. Added
WARN_ON if i915_switch_context is called in execlist mode. Moved check
for execlist mode out of i915_switch_context and into callers. Added
comment in context_reset explaining why nothing is done in execlist
mode.
Signed-off-by: NThomas Daniel <thomas.daniel@intel.com>
[danvet: Simplify the patch subject so I can understand it.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ecdb5fd8

drm/i915: Rework GPU reset sequence to match driver load & thaw · 6689c167

由 McAulay, Alistair 提交于 8月 15, 2014

This patch is to address Daniels concerns over different code during reset:

http://lists.freedesktop.org/archives/intel-gfx/2014-June/047758.html

"The reason for aiming as hard as possible to use the exact same code for
driver load, gpu reset and runtime pm/system resume is that we've simply
seen too many bugs due to slight variations and unintended omissions."

Tested using igt drv_hangman.

V2: Cleaner way of preventing check_wedge returning -EAGAIN
V3: Clean the last_context during reset, to ensure do_switch() does the MI_SET_CONTEXT. As per review.
Signed-off-by: NMcAulay, Alistair <alistair.mcaulay@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
[danvet: Rebase over ctx->ppgtt rework and extend the comment in
check_wedge a bit.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6689c167

20 8月, 2014 1 次提交

drm/i915/bdw: Make sure gpu reset still works with Execlists · cc9130be

由 Oscar Mateo 提交于 7月 24, 2014

If we reset a ring after a hang, we have to make sure that we clear
out all queued Execlists requests.

v2: The ring is, at this point, already being correctly re-programmed
for Execlists, and the hangcheck counters cleared.

v3: Daniel suggests to drop the "if (execlists)" because the Execlists
queue should be empty in legacy mode (which is true, if we do the
INIT_LIST_HEAD).

v4: Do the pending intel_runtime_pm_put
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

cc9130be

15 8月, 2014 1 次提交

drm/i915/bdw: Emission of requests with logical rings · 48e29f55

由 Oscar Mateo 提交于 7月 24, 2014

On a previous iteration of this patch, I created an Execlists
version of __i915_add_request and asbtracted it away as a
vfunc. Daniel Vetter wondered then why that was needed:

"with the clean split in command submission I expect every
function to know wether it'll submit to an lrc (everything in
intel_lrc.c) or wether it'll submit to a legacy ring (existing
code), so I don't see a need for an add_request vfunc."

The honest, hairy truth is that this patch is the glue keeping
the whole logical ring puzzle together:

- i915_add_request is used by intel_ring_idle, which in turn is
  used by i915_gpu_idle, which in turn is used in several places
  inside the eviction and gtt codes.
- Also, it is used by i915_gem_check_olr, which is littered all
  over i915_gem.c
- ...

If I were to duplicate all the code that directly or indirectly
uses __i915_add_request, I'll end up creating a separate driver.

To show the differences between the existing legacy version and
the new Execlists one, this time I have special-cased
__i915_add_request instead of adding an add_request vfunc. I
hope this helps to untangle this Gordian knot.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet: Adjust to ringbuf->FIXME_lrc_ctx per the discussion with
Thomas Daniel.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

48e29f55

13 8月, 2014 5 次提交

drm/i915: Rework ppgtt init to no require an aliasing ppgtt · 82460d97

由 Daniel Vetter 提交于 8月 06, 2014

Currently we abuse the aliasing ppgtt to set up the ppgtt support in
general. Which is a bit backwards since with full ppgtt we don't ever
need the aliasing ppgtt.

So untangle this and separate the ppgtt init from the aliasing
ppgtt. While at it drag it out of the context enabling (which just
does a switch to the default context).

Note that we still have the differentiation between synchronous and
asynchronous ppgtt setup, but that will soon vanish. So also correctly
wire up the return value handling to be prepared for when ->switch_mm
drops the synchronous parameter and could start to fail.
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

82460d97

drm/i915: Fix up checks for aliasing ppgtt · 896ab1a5

由 Daniel Vetter 提交于 8月 06, 2014

A subsequent patch will no longer initialize the aliasing ppgtt if we
have full ppgtt enabled, since we simply don't need that any more.

Unfortunately a few places check for the aliasing ppgtt instead of
checking for ppgtt in general. Fix them up.

One special case are the gtt offset and size macros, which have some
code to remap the aliasing ppgtt to the global gtt. The aliasing ppgtt
is _not_ a logical address space, so passing that in as the vm is
plain and simple a bug. So just WARN about it and carry on - we have a
gracefully fall-through anyway if we can't find the vma.
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

896ab1a5

drm/i915: Allow i915_gem_setup_global_gtt to fail · 6c5566a8

由 Daniel Vetter 提交于 8月 06, 2014

We already needs this just as a safety check in case the preallocation
reservation dance fails. But we definitely need this to be able to
move tha aliasing ppgtt setup back out of the context code to this
place, where it belongs.
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6c5566a8

drm/i915: Add proper prefix to obj_to_ggtt · 5dc383b0

由 Daniel Vetter 提交于 8月 06, 2014

Stuff in headers really aught to have this.
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

5dc383b0

drm/i915: Only refcount ppgtt if it actually is one · 841cd773

由 Daniel Vetter 提交于 8月 06, 2014

This essentially unbreaks non-ppgtt operation where we'd scribble over
random memory.

While at it give the vm_to_ppgtt function a proper prefix and make it
a bit more paranoid.
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

841cd773

12 8月, 2014 2 次提交

drm/i915: Some cleanups for the ppgtt lifetime handling · ee960be7

由 Daniel Vetter 提交于 8月 06, 2014

So when reviewing Michel's patch I've noticed a few things and cleaned
them up:
- The early checks in ppgtt_release are now redundant: The inactive
  list should always be empty now, so we can ditch these checks. Even
  for the aliasing ppgtt (though that's a different confusion) since
  we tear that down after all the objects are gone.
- The ppgtt handling functions are splattered all over. Consolidate
  them in i915_gem_gtt.c, give them OCD prefixes and add wrappers for
  get/put.
- There was a bit a confusion in ppgtt_release about whether it cares
  about the active or inactive list. It should care about them both,
  so augment the WARNINGs to check for both.

There's still create_vm_for_ctx left to do, put that is blocked on the
removal of ppgtt->ctx. Once that's done we can rename it to
i915_ppgtt_create and move it to its siblings for handling ppgtts.

v2: Move the ppgtt checks into the inline get/put functions as
suggested by Chris.

v3: Inline the now redundant ppgtt local variable.

Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMichel Thierry <michel.thierry@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ee960be7

drm/i915: vma/ppgtt lifetime rules · b9d06dd9

由 Michel Thierry 提交于 8月 06, 2014

VMAs should take a reference of the address space they use.

Now, when the fd is closed, it will release the ref that the context was
holding, but it will still be referenced by any vmas that are still
active.

ppgtt_release() should then only be called when the last thing referencing
it releases the ref, and it can just call the base cleanup and free the
ppgtt.

Note that with this we will extend the lifetime of ppgtts which
contain shared objects. But all the non-shared objects will get
removed as soon as they drop of the active list and for the shared
ones the shrinker can eventually reap them. Since we currently can't
evict ppgtt pagetables either I don't think that temporary leak is
important.
Signed-off-by: NMichel Thierry <michel.thierry@intel.com>
[danvet: Add note about potential ppgtt leak with this approach.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b9d06dd9

11 8月, 2014 6 次提交

drm/i915/bdw: Skeleton for the new logical rings submission path · 454afebd

由 Oscar Mateo 提交于 7月 24, 2014

Execlists are indeed a brave new world with respect to workload
submission to the GPU.

In previous version of these series, I have tried to impact the
legacy ringbuffer submission path as little as possible (mostly,
passing the context around and using the correct ringbuffer when I
needed one) but Daniel is afraid (probably with a reason) that
these changes and, especially, future ones, will end up breaking
older gens.

This commit and some others coming next will try to limit the
damage by creating an alternative path for workload submission.
The first step is here: laying out a new ring init/fini.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

454afebd

drm/i915: Abstract the legacy workload submission mechanism away · a83014d3

由 Oscar Mateo 提交于 7月 24, 2014

As suggested by Daniel Vetter. The idea, in subsequent patches, is to
provide an alternative to these vfuncs for the Execlists submission
mechanism.

v2: Splitted into two and reordered to illustrate our intentions, instead
of showing it off. Also, remove the add_request vfunc and added the
stop_ring one.
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
[danvet:
- Make checkpatch happy.
- Be grumpy about the excessive vtable.
- Ditch gt->is_ring_initialized.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a83014d3

drm/i915/bdw: Macro for LRCs and module option for Execlists · 127f1003

由 Oscar Mateo 提交于 7月 24, 2014

GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts".
These expanded contexts enable a number of new abilities, especially
"Execlists".

The macro is defined to off until we have things in place to hope to
work.

v2: Rename "advanced contexts" to the more correct "logical ring
contexts".

v3: Add a module parameter to enable execlists. Execlist are relatively
new, and so it'd be wise to be able to switch back to ring submission
to debug subtle problems that will inevitably arise.

v4: Add an intel_enable_execlists function.

v5: Sanitize early, as suggested by Daniel. Remove lrc_enabled.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com> (v3)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com> (v2, v4 & v5)
Reviewed-by: NDamien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

127f1003

drm/i915: Remove fenced_gpu_access and pending_fenced_gpu_access · 82b6b6d7

由 Chris Wilson 提交于 8月 09, 2014

This migrates the fence tracking onto the existing seqno
infrastructure so that the later conversion to tracking via requests is
simplified.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

82b6b6d7

drm/i915: Force CPU relocations if not GTT mapped · e6a84468

由 Chris Wilson 提交于 8月 11, 2014

Move the decision on whether we need to have a mappable object during
execbuffer to the fore and then reuse that decision by propagating the
flag through to reservation. As a corollary, before doing the actual
relocation through the GTT, we can make sure that we do have a GTT
mapping through which to operate.

Note that the key to make this work is to ditch the
obj->map_and_fenceable unbind optimization - with full ppgtt it
doesn't make a lot of sense any more anyway.

v2: Revamp and resend to ease future patches.
v3: Refresh patch rationale

References: https://bugs.freedesktop.org/show_bug.cgi?id=81094Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: Explain why obj->map_and_fenceable is key and split out the
secure batch fix.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

e6a84468

drm/i915: Only perform set-to-gtt domain for objects bound to the global gtt · dc8cd1e7

由 Chris Wilson 提交于 8月 09, 2014

If an object is not bound into the global GTT, then it cannot be
accessed via the GTT. This restores the original code that was muddled
by ppGTT. In the process, we remove a WARN that had long outlived its
usefulness and was simply being coded around instead.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

dc8cd1e7

07 8月, 2014 1 次提交

drm/i915: Bring GPU Freq to min while suspending. · 274fa1c1

由 Deepak S 提交于 8月 05, 2014

We might be leaving the PGU Frequency (and thus vnn) high during the suspend.
Flusing the delayed work queue should take care of this.
Signed-off-by: NDeepak S <deepak.s@linux.intel.com>
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

274fa1c1

24 7月, 2014 1 次提交

drm: i915: Use nsec based interfaces · 5ed0bdf2

由 Thomas Gleixner 提交于 7月 16, 2014

Use ktime_get_raw_ns() and get rid of the back and forth timespec
conversions.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

5ed0bdf2

23 7月, 2014 4 次提交

drm/i915: Simplify i915_gem_release_all_mmaps() · eedd10f4

由 Chris Wilson 提交于 6月 16, 2014

An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.

From commit 48018a57
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Fri Dec 13 15:22:31 2013 -0200

    drm/i915: release the GTT mmaps when going into D3

Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.

References: https://bugs.freedesktop.org/show_bug.cgi?id=80081Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
[danvet: cherry-pick from -next to -fixes.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

eedd10f4

drm/i915: Do not unmap object unless no other VMAs reference it · 9490edb5

由 Armin Reese 提交于 7月 11, 2014

When using an IOMMU, GEM objects are mapped by their DMA address as the
physical address is unknown. This depends on the underlying IOMMU
driver to map and unmap the physical pages properly as defined in
intel_iommu.c.

The current code will tell the IOMMU to unmap the GEM BO's pages on the
destruction of the first VMA that "maps" that BO. This is clearly wrong
as there may be other VMAs "mapping" that BO (using flink). The scanout
is one such example.

The patch fixes this issue by only unmapping the DMA maps when there are
no more VMAs mapping that object. This is equivalent to when an object
is considered unbound as can be seen by the code. On the first VMA that
again because bound, we will remap.

An alternate solution would be to move the dma mapping to object
creation and destrubtion. I am not sure if this is considered an
unfriendly thing to do.

Some notes to backporters trying to backport full PPGTT:

The bug can never be hit without enabling the IOMMU. The existing code
will also do the right thing when the object is shared via dmabuf. The
failure should be demonstrable with flink. In cases when not using
intel_iommu_strict it is likely (likely, as defined by: off the top of
my head) on current workloads to *not* hit this bug since we often
teardown all VMAs for an object shared across multiple VMs.  We also
finish access to that object before the first dma_unmapping.
intel_iommu_strict with flinked buffers is likely to hit this issue.
Signed-off-by: NArmin Reese <armin.c.reese@intel.com>
[danvet: Add the excellent commit message provided by Ben.]
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9490edb5

drm/i915: add helper for checking whether IRQs are enabled · 9df7575f

由 Jesse Barnes 提交于 6月 20, 2014

Now that we use the runtime IRQ enable/disable functions in our suspend
path, we can simply check the pm._irqs_disabled flag everywhere.  So
rename it to catch the users, and add an inline for it to make the
checks clear everywhere.
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9df7575f

drm/i915: Abandon oom quickly if killed by a signal · a1db2fa7

由 Chris Wilson 提交于 7月 11, 2014

Whilst waiting to obtain our locks for the last resort shrinking before
an oom, we check whether or not a fatal signal was pending. If there was,
we do not need to keep waiting as the oom will be aborted.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a1db2fa7

08 7月, 2014 3 次提交

drm/i915: Generalize intel_ring_get_tail to take a ringbuf · 1b5d063f

由 Oscar Mateo 提交于 7月 03, 2014

Again, it's low-level enough to simply take a ringbuf and nothing
else.

Trivial change.
Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NOscar Mateo <oscar.mateo@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

1b5d063f

drm/i915: Restrict GPU boost to the RCS engine · ec5cc0f9

由 Chris Wilson 提交于 6月 12, 2014

Make the assumption that media workloads are not as latency sensitive
for __wait_seqno, and that upclocking the GPU does not affect the BLT
engine. Under that assumption, we only wait to forcibly upclock the GPU
when we are stalling for results from the render pipeline.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ec5cc0f9

drm/i915: Updating comments. · ddd4dbc6

由 Rodrigo Vivi 提交于 6月 30, 2014

ring index calculation table was out of date after other rings were added,
although the formula is flexible and scale when adding new rings.

So this patch just update the comments and add a brief explanation
why to use sync_seqno[ring index].
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: NBen Widawsky <ben@bwidawsk.net>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ddd4dbc6

20 6月, 2014 1 次提交

drm/i915: Track frontbuffer invalidation/flushing · f99d7069

由 Daniel Vetter 提交于 6月 19, 2014

So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.

Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.

But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.

To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).

Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.

v2: Lots of improvements

Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
  from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
  Suggested by Chris.

Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.

v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.

v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.

v5: Fixup locking around the fbcon set_to_gtt_domain call.

v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
  functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
  object. We already have precedence for fb_obj in the pin_and_fence
  functions.

v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
  directly. These functions center on the plane, the actual object is
  irrelevant - even a flip to the same object as already active should
  cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
  currently just calls intel_frontbuffer_flush since the implemenation
  differs.

This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.

Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.

v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.

v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.

v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
  in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
  still has work left to do before it's fully generic.

v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.

v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f99d7069

19 6月, 2014 2 次提交

drm/i915: Introduce accurate frontbuffer tracking · a071fa00

由 Daniel Vetter 提交于 6月 18, 2014

So from just a quick look we seem to have enough information to
accurately figure out whether a given gem bo is used as a frontbuffer
and where exactly: We have obj->pin_count as a first check with no
false negatives and only negligible false positives. And then we can
just walk the modeset objects and figure out where exactly a buffer is
used as scanout.

Except that we can't due to locking order: If we already hold
dev->struct_mutex we can't acquire any modeset locks, so could
potential chase freed pointers and other evil stuff.

So we need something else. For that introduce a new set of bits
obj->frontbuffer_bits to track where a buffer object is used. That we
can then chase without grabbing any modeset locks.

Of course the consumers of this (DRRS, PSR, FBC, ...) still need to be
able to do their magic both when called from modeset and from gem
code. But that can be easily achieved by adding locks for these
specific subsystems which always nest within either kms or gem
locking.

This patch just adds the relevant update code to all places.

Note that if we ever support multi-planar scanout targets then we need
one frontbuffer tracking bit per attachment point that we expose to
userspace.

v2:
- Fix more oopsen. Oops.
- WARN if we leak obj->frontbuffer_bits when freeing a gem buffer. Fix
  the bugs this brought to light.
- s/update_frontbuffer_bits/update_fb_bits/. More consistent with the
  fb tracking functions (fb for gem object, frontbuffer for raw bits).
  And the function name was way too long.

v3: Size obj->frontbuffer_bits correctly so that all pipes fit in.

v4: Don't update fb bits in set_base on failure. Noticed by Chris.

v5: s/i915_gem_update_fb_bits/i915_gem_track_fb/ Also remove a few
local enum pipe variables which are now no longer needed to make the
function arguments no drop over the 80 char limit.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a071fa00

drm/i915: Drop schedule_back from psr_exit · 3108e99e

由 Daniel Vetter 提交于 6月 18, 2014

It doesn't make sense to never again schedule the work, since by the
time we might want to re-enable psr the world might have changed and
we can do it again.

The only exception is when we shut down the pipe, but that's an
entirely different thing and needs to be handled in psr_disable.

Note that later patch will again split psr_exit into psr_invalidate
and psr_flush. But the split is different and this simplification
helps with the transition.

v2: Improve the commit message a bit.

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

3108e99e

18 6月, 2014 3 次提交

drm/i915: Don't BUG_ON in i915_gem_obj_offset · f25748ea

由 Daniel Vetter 提交于 6月 17, 2014

A WARN_ON is perfectly fine.

The BUG in here seems to be the cause behind hard-hangs when I cat the
i915_gem_pageflip debugfs file (which calls this from an irq
spinlock). But only while running a full igt run after a while. I
still need to root cause the underlying issue.

I'll also start reject patches which add new BUG_ON but don't come
with a really good justification for it. The general rule really
should be to just WARN and hope the driver survives for long enough.

v2: Make the WARN a bit more useful per Chris' suggestion.
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f25748ea

drm/i915: Don't prefault the entire obj if the vma is smaller · beff0d0f

由 Ville Syrjälä 提交于 6月 17, 2014

Take the minimum of the object size and the vma size and prefault
only that much. Avoids a SIGBUS when mmapping only a portion of the
object.

Prefaulting was introduced here:
 commit b90b91d8
 Author: Chris Wilson <chris@chris-wilson.co.uk>
 Date:   Tue Jun 10 12:14:40 2014 +0100

    drm/i915: Prefault the entire object on first page fault

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Testcase: igt/gem_mmap/short-mmap
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

beff0d0f

drm/i915: use shmem helpers if possible · 46612707

由 David Herrmann 提交于 5月 25, 2014

Instead of shuffling gfp-masks all the time, use the
shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are
set in mapping_gfp_mask() for i915, so the behavior is still the same.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>

46612707

17 6月, 2014 2 次提交

drm/i915: Replaced Blitter ring based flips with MMIO flips · 84c33a64

由 Sourab Gupta 提交于 6月 02, 2014

This patch enables the framework for using MMIO based flip calls,
in contrast with the CS based flip calls which are being used currently.

MMIO based flip calls can be enabled on architectures where
Render and Blitter engines reside in different power wells. The
decision to use MMIO flips can be made based on workloads to give
100% residency for Media power well.

v2: The MMIO flips now use the interrupt driven mechanism for issuing the
flips when target seqno is reached. (Incorporating Ville's idea)

v3: Rebasing on latest code. Code restructuring after incorporating
Damien's comments

v4: Addressing Ville's review comments
    -general cleanup
    -updating only base addr instead of calling update_primary_plane
    -extending patch for gen5+ platforms

v5: Addressed Ville's review comments
    -Making mmio flip vs cs flip selection based on module parameter
    -Adding check for DRIVER_MODESET feature in notify_ring before calling
     notify mmio flip.
    -Other changes mostly in function arguments

v6: -Having a seperate function to check condition for using mmio flips (Ville)
    -propogating error code from i915_gem_check_olr (Ville)

v7: -Adding __must_check with i915_gem_check_olr (Chris)
    -Renaming mmio_flip_data to mmio_flip (Chris)
    -Rebasing on latest nightly

v8: -Rebasing on latest code
    -squash 3rd patch in series(mmio setbase vs page flip race) with this patch
    -Added new tiling mode update in intel_do_mmio_flip (Chris)

v9: -check for obj->last_write_seqno being 0 instead of obj->ring being NULL in
intel_postpone_flip, as this is a more restrictive condition (Chris)

v10: -Applied Chris's suggestions for squashing patches 2,3 into this patch.
These patches make the selection of CS vs MMIO flip at the page flip time, and
make the module parameter for using mmio flips as tristate, the states being
'force CS flips', 'force mmio flips', 'driver discretion'.
Changed the logic for driver discretion (Chris)

v11: Minor code cleanup(better readability, fixing whitespace errors, using
lockdep to check mutex locked status in postpone_flip, removal of __must_check
in function definition) (Chris)
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NSourab Gupta <sourab.gupta@intel.com>
Signed-off-by: NAkash Goel <akash.goel@intel.com>
Tested-by: Chris Wilson <chris@chris-wilson.co.uk> # snb, ivb
[danvet: Fix up parameter alignement checkpatch spotted.]
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

84c33a64

drm/i915: Simplify i915_gem_release_all_mmaps() · 6254b204

由 Chris Wilson 提交于 6月 16, 2014

An object can only have an active gtt mapping if it is currently bound
into the global gtt. Therefore we can simply walk the list of all bound
objects and check the flag upon those for an active gtt mapping.

From commit 48018a57
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Fri Dec 13 15:22:31 2013 -0200

    drm/i915: release the GTT mmaps when going into D3

Also note that the WARN is inappropriate for this function as GPU
activity is orthogonal to GTT mmap status. Rather it is the caller that
relies upon this condition and so it should assert that the GPU is idle
itself.

References: https://bugs.freedesktop.org/show_bug.cgi?id=80081Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: NPaulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

6254b204

14 6月, 2014 1 次提交

drm/i915: Force PSR exit by inactivating it. · 7c8f8a70

由 Rodrigo Vivi 提交于 6月 13, 2014

The perfect solution for psr_exit is the hardware tracking the changes and
doing the psr exit by itself. This scenario works for HSW and BDW with some
environments like Gnome and Wayland.

However there are many other scenarios that this isn't true. Mainly one right
now is KDE users on HSW and BDW with PSR on. User would miss many screen
updates. For instances any key typed could be seen only when mouse cursor is
moved. So this patch introduces the ability of trigger PSR exit on kernel side
on some common cases that.

Most of the cases are coverred by psr_exit at set_domain. The remaining cases
are coverred by triggering it at set_domain, busy_ioctl, sw_finish and
mark_busy.

The downside here might be reducing the residency time on the cases this
already work very wall like Gnome environment. But so far let's get focused
on fixinge issues sio PSR couild be used for everybody and we could even
get it enabled by default. Later we can add some alternatives to choose the
level of PSR efficiency over boot flag of even over crtc property.

v2: remove exit from connector_dpms. Daniel pointed this is the wrong way and
also this isn't needed for BDW and HSW anyway.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NVijay Purushothaman <vijay.a.purushothaman@intel.com>
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

7c8f8a70

13 6月, 2014 2 次提交

drm/i915: Prefault the entire object on first page fault · b90b91d8

由 Chris Wilson 提交于 6月 10, 2014

Inserting additional PTEs has no side-effect for us as the pfn are fixed
for the entire time the object is resident in the global GTT. The
downside is that we pay the entire cost of faulting the object upon the
first hit, for which we in return receive the benefit of removing the
per-page faulting overhead.

On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences,
Upload rate for 2 linear surfaces:	8127MiB/s -> 8134MiB/s
Upload rate for 2 tiled surfaces:	8607MiB/s -> 8625MiB/s
Upload rate for 4 linear surfaces:	8127MiB/s -> 8127MiB/s
Upload rate for 4 tiled surfaces:	8611MiB/s -> 8602MiB/s
Upload rate for 8 linear surfaces:	8114MiB/s -> 8124MiB/s
Upload rate for 8 tiled surfaces:	8601MiB/s -> 8603MiB/s
Upload rate for 16 linear surfaces:	8110MiB/s -> 8123MiB/s
Upload rate for 16 tiled surfaces:	8595MiB/s -> 8606MiB/s
Upload rate for 32 linear surfaces:	8104MiB/s -> 8121MiB/s
Upload rate for 32 tiled surfaces:	8589MiB/s -> 8605MiB/s
Upload rate for 64 linear surfaces:	8107MiB/s -> 8121MiB/s
Upload rate for 64 tiled surfaces:	2013MiB/s -> 3017MiB/s
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: "Goel, Akash" <akash.goel@intel.com>
Testcasee: igt/gem_fence_upload/performance
Reviewed-by: NBrad Volkin <bradley.d.volkin@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

b90b91d8

drm/i915: Use the .release hook to drop the stolen drm_mm tracking · ef0cf27c

由 Chris Wilson 提交于 6月 06, 2014

Now that we have a release hook into i915_gem_object_free, we can move
the explicit call to the internal stolen function and hook it up
throught the callback instead.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ef0cf27c

11 6月, 2014 1 次提交

drm/i915: use shmem helpers if possible · f461d1be

由 David Herrmann 提交于 5月 25, 2014

Instead of shuffling gfp-masks all the time, use the
shmem_read_mapping_page() helper. Note that __GFP_IO and __GFP_WAIT are
set in mapping_gfp_mask() for i915, so the behavior is still the same.

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

f461d1be

05 6月, 2014 1 次提交

drm/i915: Silence the WARN if the user tries to GTT mmap an incoherent object · ddeff6ee

由 Chris Wilson 提交于 5月 28, 2014

If the user tries to mmap through the GTT an object that is marked as
snooped, we report an error rather than allow the GPU to hang the
machine. The choice of EINVAL, however, was unfortunate as we turn that
into a WARN rather than a quiet SIGBUS.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

ddeff6ee

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功