提交 · 3272db53136f6be7555fb294db3a6e3f372b9380 · openeuler / raspberrypi-kernel

05 8月, 2016 2 次提交

drm/i915: Combine all i915_vma bitfields into a single set of flags · 3272db53

由 Chris Wilson 提交于 8月 04, 2016

In preparation to perform some magic to speed up i915_vma_pin(), which
is among the hottest of hot paths in execbuf, refactor all the bitfields
accessed by i915_vma_pin() into a single unified set of flags.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk

3272db53

drm/i915: Wrap vma->pin_count accessors with small inline helpers · 20dfbde4

由 Chris Wilson 提交于 8月 04, 2016

In the next few patches, the VMA pinning API is overhauled and to reduce
the churn we pull out the update to the accessors into a prep patch.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk

20dfbde4

04 8月, 2016 8 次提交

drm/i915: Track active vma requests · b0decaf7

由 Chris Wilson 提交于 8月 04, 2016

Hook the vma itself into the i915_gem_request_retire() so that we can
accurately track when a solitary vma is inactive (as opposed to having
to wait for the entire object to be idle). This improves the interaction
when using multiple contexts (with full-ppgtt) and eliminates some
frequent list walking when retiring objects after a completed request.

A side-effect is that we get an active vma reference for free. The
consequence of this is shown in the next patch...

v2: Update inline names to be consistent with
i915_gem_object_get_active()
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk

b0decaf7

drm/i915: Rename request->list to link for consistency · efdf7c06

由 Chris Wilson 提交于 8月 04, 2016

We use "list" to denote the list and "link" to denote an element on that
list. Rename request->list to match this idiom.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-14-git-send-email-chris@chris-wilson.co.uk

efdf7c06

drm/i915: Mark up i915_gem_active for locking annotation · d72d908b

由 Chris Wilson 提交于 8月 04, 2016

The future annotations will track the locking used for access to ensure
that it is always sufficient. We make the preparations now to present
the API ahead and to make sure that GCC can eliminate the unused
parameter.

Before:	6298417 3619610  696320 10614347         a1f64b vmlinux
After:	6298417 3619610  696320 10614347         a1f64b vmlinux
(with i915 builtin)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-12-git-send-email-chris@chris-wilson.co.uk

d72d908b

drm/i915: Prepare i915_gem_active for annotations · 27c01aae

由 Chris Wilson 提交于 8月 04, 2016

In the future, we will want to add annotations to the i915_gem_active
struct. The API is thus expanded to hide direct access to the contents
of i915_gem_active and mediated instead through a number of helpers.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-11-git-send-email-chris@chris-wilson.co.uk

27c01aae

drm/i915: Introduce i915_gem_active for request tracking · 381f371b

由 Chris Wilson 提交于 8月 04, 2016

In the next patch, request tracking is made more generic and for that we
need a new expanded struct and to separate out the logic changes from
the mechanical churn, we split out the structure renaming into this
patch.

v2: Writer's block. Add some spiel about why we track requests.
v3: Now i915_gem_active.
v4: Now with i915_gem_active_set() for attaching to the active request.
v5: Use i915_gem_active_set() from inside the retirement handlers
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-10-git-send-email-chris@chris-wilson.co.uk

381f371b

drm/i915: Count how many VMA are bound for an object · 15717de2

由 Chris Wilson 提交于 8月 04, 2016

Since we may have VMA allocated for an object, but we interrupted their
binding, there is a disparity between have elements on the obj->vma_list
and being bound. i915_gem_obj_bound_any() does this check, but this is
not rigorously observed - add an explicit count to make it easier.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-7-git-send-email-chris@chris-wilson.co.uk

15717de2

drm/i915: Store owning file on the i915_address_space · 2bfa996e

由 Chris Wilson 提交于 8月 04, 2016

For the global GTT (and aliasing GTT), the address space is owned by the
device (it is a global resource) and so the per-file owner field is
NULL. For per-process GTT (where we create an address space per
context), each is owned by the opening file. We can use this ownership
information to both distinguish GGTT and ppGTT address spaces, as well
as occasionally inspect the owner.

v2: Whitespace, tells us who owns i915_address_space
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-6-git-send-email-chris@chris-wilson.co.uk

2bfa996e

drm/i915: Amalgamate GGTT/ppGTT vma debug list walkers · 894eeecc

由 Chris Wilson 提交于 8月 04, 2016

As we can now have multiple VMA inside the global GTT (with partial
mappings, rotations, etc), it is no longer true that there may just be a
single GGTT entry and so we should walk the full vma_list to count up
the actual usage. In addition to unifying the two walkers, switch from
multiplying the object size for each vma to summing the bound vma sizes.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-1-git-send-email-chris@chris-wilson.co.uk

894eeecc

03 8月, 2016 2 次提交

drm/i915: Rename struct intel_ringbuffer to struct intel_ring · 7e37f889

由 Chris Wilson 提交于 8月 02, 2016

The state stored in this struct is not only the information about the
buffer object, but the ring used to communicate with the hardware. Using
buffer here is overly specific and, for me at least, conflates with the
notion of buffer objects themselves.

s/struct intel_ringbuffer/struct intel_ring/
s/enum intel_ring_hangcheck/enum intel_engine_hangcheck/
s/describe_ctx_ringbuf()/describe_ctx_ring()/
s/intel_ring_get_active_head()/intel_engine_get_active_head()/
s/intel_ring_sync_index()/intel_engine_sync_index()/
s/intel_ring_init_seqno()/intel_engine_init_seqno()/
s/ring_stuck()/engine_stuck()/
s/intel_cleanup_engine()/intel_engine_cleanup()/
s/intel_stop_engine()/intel_engine_stop()/
s/intel_pin_and_map_ringbuffer_obj()/intel_pin_and_map_ring()/
s/intel_unpin_ringbuffer()/intel_unpin_ring()/
s/intel_engine_create_ringbuffer()/intel_engine_create_ring()/
s/intel_ring_flush_all_caches()/intel_engine_flush_all_caches()/
s/intel_ring_invalidate_all_caches()/intel_engine_invalidate_all_caches()/
s/intel_ringbuffer_free()/intel_ring_free()/
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-15-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-4-git-send-email-chris@chris-wilson.co.uk

7e37f889

drm/i915: Rename intel_context[engine].ringbuf · dca33ecc

由 Chris Wilson 提交于 8月 02, 2016

Perform s/ringbuf/ring/ on the context struct for consistency with the
ring/engine split.

v2: Kill an outdated error_ringbuf label
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469432687-22756-14-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1470174640-18242-3-git-send-email-chris@chris-wilson.co.uk

dca33ecc

02 8月, 2016 1 次提交

drm/i915/gen9: Update i915_drpc_info debugfs for coarse pg & forcewake info · f2dd7578

由 Akash Goel 提交于 6月 27, 2016

Updated the i915_drpc_info debugfs with coarse power gating & forcewake
info for Gen9.

v2: Change all IS_GEN9() by gen >= 9 (Damien)

v3: Rebase

Cc: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: NAkash Goel <akash.goel@intel.com>
Reviewed-by: NSagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1467038401-8283-1-git-send-email-akash.goel@intel.com

f2dd7578

01 8月, 2016 1 次提交

drm/i915/debugfs: Take runtime_pm ref for sseu · 238010ed

由 David Weinehall 提交于 8月 01, 2016

When reading the SSEU statistics, we need to call
intel_runtime_pm_get() first, otherwise we might end up
triggering "Device suspended during HW access".
Signed-off-by: NDavid Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470062007-26996-1-git-send-email-david.weinehall@linux.intel.comReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>

238010ed

20 7月, 2016 3 次提交

drm/i915: Convert i915_semaphores_is_enabled over to early sanitize · 39df9190

由 Chris Wilson 提交于 7月 20, 2016

Rather than recomputing whether semaphores are enabled, we can do that
computation once during early initialisation as the i915.semaphores
module parameter is now read-only.

s/i915_semaphores_is_enabled/i915.semaphores/

v2: Add the state to the debug dmesg as well
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1469005202-9659-10-git-send-email-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469017917-15134-9-git-send-email-chris@chris-wilson.co.uk

39df9190

drm/i915: Disable waitboosting for mmioflips/semaphores · 197be2ae

由 Chris Wilson 提交于 7月 20, 2016

Since commit a6f766f3 ("drm/i915: Limit ring synchronisation (sw
sempahores) RPS boosts") and commit bcafc4e3 ("drm/i915: Limit mmio
flip RPS boosts") we have limited the waitboosting for semaphores and
flips. Ideally we do not want to boost in either of these instances as no
userspace consumer is waiting upon the results (though a userspace producer
may be stalled trying to submit an execbuf - but in this case the
producer is being throttled due to the engine being saturated with
work). With the introduction of NO_WAITBOOST in the previous patch, we
can finally disable these needless boosts.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-6-git-send-email-chris@chris-wilson.co.uk

197be2ae

drm/i915: Derive GEM requests from dma-fence · 04769652

由 Chris Wilson 提交于 7月 20, 2016

dma-buf provides a generic fence class for interoperation between
drivers. Internally we use the request structure as a fence, and so with
only a little bit of interfacing we can rebase those requests on top of
dma-buf fences. This will allow us, in the future, to pass those fences
back to userspace or between drivers.

v2: The fence_context needs to be globally unique, not just unique to
this device.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1469002875-2335-4-git-send-email-chris@chris-wilson.co.uk

04769652

14 7月, 2016 2 次提交

drm/i915: Remove superfluous powersave work flushing · 62e1baa1

由 Chris Wilson 提交于 7月 13, 2016

Instead of flushing the outstanding enabling, remember the requested
frequency to apply when the powersave work runs.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-6-git-send-email-chris@chris-wilson.co.uk

62e1baa1

drm/i915: Define a separate variable and control for RPS waitboost frequency · 29ecd78d

由 Chris Wilson 提交于 7月 13, 2016

To allow the user finer control over waitboosting, allow them to set the
frequency we request for the boost. This also them allows to effectively
disable the boosting by setting the boost request to a low frequency.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-5-git-send-email-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>

29ecd78d

07 7月, 2016 1 次提交

drm/i915: s/INTEL_OUTPUT_DISPLAYPORT/INTEL_OUTPUT_DP/ · cca0502b

由 Ville Syrjälä 提交于 6月 22, 2016

INTEL_OUTPUT_DISPLAYPORT hsa been bugging me for a long time. It always
looks out of place besides INTEL_OUTPUT_EDP and INTEL_OUTPUT_DP_MST.
Let's just rename it to INTEL_OUTPUT_DP.

v2: Rebase
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NMika Kahola <mika.kahola@intel.com>
Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466621833-5054-9-git-send-email-ville.syrjala@linux.intel.com

cca0502b

06 7月, 2016 1 次提交

drm/i915: Group the irq breadcrumb variables into the same cacheline · aca34b6e

由 Chris Wilson 提交于 7月 06, 2016

As we inspect both the tasklet (to check for an active bottom-half) and
set the irq-posted flag at the same time (both in the interrupt handler
and then in the bottom-halt), group those two together into the same
cacheline. (Not having total control over placement of the struct means
we can't guarantee the cacheline boundary, we need to align the kmalloc
and then each struct, but the grouping should help.)

v2: Try a couple of different names for the state touched by the user
interrupt handler.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467805142-22219-3-git-send-email-chris@chris-wilson.co.uk

aca34b6e

05 7月, 2016 1 次提交

drm/i915: Convert dev_priv->dev backpointers to dev_priv->drm · 91c8a326

由 Chris Wilson 提交于 7月 05, 2016

Since drm_i915_private is now a subclass of drm_device we do not need to
chase the drm_i915_private->dev backpointer and can instead simply
access drm_i915_private->drm directly.

   text	   data	    bss	    dec	    hex	filename
1068757	   4565	    416	1073738	 10624a	drivers/gpu/drm/i915/i915.ko
1066949	   4565	    416	1071930	 105b3a	drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:
@@
struct drm_i915_private *d;
identifier i;
@@
(
- d->dev->i
+ d->drm.i
|
- d->dev
+ &d->drm
)

and for good measure the dev_priv->dev backpointer was removed entirely.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467711623-2905-4-git-send-email-chris@chris-wilson.co.uk

91c8a326

04 7月, 2016 4 次提交

drm/i915: Mass convert dev->dev_private to to_i915(dev) · fac5e23e

由 Chris Wilson 提交于 7月 04, 2016

Since we now subclass struct drm_device, we can save pointer dances by
noting the equivalence of struct drm_device and struct drm_i915_private,
i.e. by using to_i915().

   text    data     bss     dec     hex filename
1073824    4562     416 1078802  107612 drivers/gpu/drm/i915/i915.ko
1068976    4562     416 1073954  106322 drivers/gpu/drm/i915/i915.ko

Created by the coccinelle script:

@@
expression E;
identifier p;
@@
- struct drm_i915_private *p = E->dev_private;
+ struct drm_i915_private *p = to_i915(E);
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NDave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467628477-25379-1-git-send-email-chris@chris-wilson.co.uk

fac5e23e

drm/i915: Limit i915_ring_test_irq debugfs to actual rings · 3a122c27

由 Chris Wilson 提交于 6月 17, 2016

For simplicity in testing, only report known rings in the mask. This
allows userspace to try and trigger a missed irq on every ring and do a
comparison between i915_ring_test_irq and i915_ring_missed_irq to see if
any rings failed.

v2: Move the debug message to after the rings are selected (so that the
message accurately reflects reality)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466170505-8048-1-git-send-email-chris@chris-wilson.co.ukReviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>

3a122c27

drm/i915: Remove stop-rings debugfs interface · 7b4d3a16

由 Chris Wilson 提交于 7月 04, 2016

Now that we have (near) universal GPU recovery code, we can inject a
real hang from userspace and not need any fakery. Not only does this
mean that the testing is far more realistic, but we can simplify the
kernel in the process.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NArun Siluvery <arun.siluvery@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-7-git-send-email-chris@chris-wilson.co.uk

7b4d3a16

drm/i915: Only start retire worker when idle · 67d97da3

由 Chris Wilson 提交于 7月 04, 2016

The retire worker is a low frequency task that makes sure we retire
outstanding requests if userspace is being lax. We only need to start it
once as it remains active until the GPU is idle, so do a cheap test
before the more expensive queue_work(). A consequence of this is that we
need correct locking in the worker to make the hot path of request
submission cheap. To keep the symmetry and keep hangcheck strictly bound
by the GPU's wakelock, we move the cancel_sync(hangcheck) to the idle
worker before dropping the wakelock.

v2: Guard against RCU fouling the breadcrumbs bottom-half whilst we kick
the waiter.
v3: Remove the wakeref assertion squelching (now we hold a wakeref for
the hangcheck, any rpm error there is genuine).
v4: To prevent excess work when retiring requests, we split the busy
flag into two, a boolean to denote whether we hold the wakeref and a
bitmask of active engines.
v5: Reorder cancelling hangcheck upon idling to avoid a race where we
might cancel a hangcheck after being preempted by a new task
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.freedesktop.org/show_bug.cgi?id=88437Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467616119-4093-1-git-send-email-chris@chris-wilson.co.uk

67d97da3

03 7月, 2016 1 次提交

drm/i915: Fix indentation in i915_gem_framebuffer_info() · 25bcce94

由 Chris Wilson 提交于 7月 02, 2016

smatch complains:

drivers/gpu/drm/i915/i915_debugfs.c:1390 i915_frequency_info() Function
too hairy.  Giving up.
drivers/gpu/drm/i915/i915_debugfs.c:1985 i915_gem_framebuffer_info()
warn: inconsistent indenting
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1467470166-31717-3-git-send-email-chris@chris-wilson.co.ukReviewed-by: NMatthew Auld <matthew.auld@intel.com>

25bcce94

02 7月, 2016 3 次提交

drm/i915: Use HWS for seqno tracking everywhere · 1b7744e7

由 Chris Wilson 提交于 7月 01, 2016

By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

v2: Fix semaphore_passed() to look at the signaling engine (not the
waiter's)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk

1b7744e7

drm/i915: Spin after waking up for an interrupt · f69a02c9

由 Chris Wilson 提交于 7月 01, 2016

When waiting for an interrupt (waiting for the engine to complete some
work), we know we are the only waiter to be woken on this engine. We also
know when the GPU has nearly completed our request (or at least started
processing it), so after being woken and we detect that the GPU is
active and working on our request, allow us the bottom-half (the first
waiter who wakes up to handle checking the seqno after the interrupt) to
spin for a very short while to reduce client latencies.

The impact is minimal, there was an improvement to the realtime-vs-many
clients case, but exporting the function proves useful later. However,
it is tempting to adjust irq_seqno_barrier to include the spin. The
problem is first ensuring that the "start-of-request" seqno is coherent
as we use that as our basis for judging when it is ok to spin. If we
could, spinning there could dramatically shorten some sleeps, and allow
us to make the barriers more conservative to handle missed seqno writes
on more platforms (all gen7+ are known to have the occasional issue, at
least).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-7-git-send-email-chris@chris-wilson.co.uk

f69a02c9

drm/i915: Slaughter the thundering i915_wait_request herd · 688e6c72

由 Chris Wilson 提交于 7月 01, 2016

One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk

688e6c72

24 6月, 2016 2 次提交

drm/i915: Split idling from forcing context switch · 6e5a5beb

由 Chris Wilson 提交于 6月 24, 2016

We only need to force a switch to the kernel context placeholder during
eviction. All other uses of i915_gpu_idle() just want to wait until
existing work on the GPU is idle. Rename i915_gpu_idle() to
i915_gem_wait_for_idle() to avoid any implications about "parking" the
context first.

v2: Tweak an error message if the wait fails for the ilk vtd w/a
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466776558-21516-6-git-send-email-chris@chris-wilson.co.uk

6e5a5beb

drm/i915: Register debugfs interface last · 1dac891c

由 Chris Wilson 提交于 6月 24, 2016

Currently debugfs files are created before the driver is even loads.
This gives the opportunity for userspace to open that interface and poke
around before the backing data structures are initialised - with the
possibility of oopsing or worse.

Move the creation of the debugfs files to our registration phase, where
we announce our presence to the world when we are ready, i.e the
sequence changes from

	drm_dev_register()
	 -> drm_minor_register()
	  -> drm_debugfs_init()
	   -> i915_debugfs_init()
	 -> i915_driver_load()

to

	drm_dev_register()
	 -> drm_minor_register()
	  -> drm_debugfs_init()
	 -> i915_driver_load()
	  -> i915_debugfs_register()
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1466773227-7994-5-git-send-email-chris@chris-wilson.co.uk

1dac891c

21 6月, 2016 5 次提交

drm/i915: Use connector_type for printing in intel_connector_info, v2. · ee648a74

由 Maarten Lankhorst 提交于 6月 21, 2016

Instead of looking at encoder->type, which may be set to UNKNOWN,
use connector->connector_type. Info cannot be printed for MST
connectors which may have a NULL encoder, return early in that case.

Changes since v1:
- Whitelist encoder types for HDMI and LVDS.
- Fix oops on MST.
- Do not list encoder types for eDP/DP, they're always valid.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/7cf34026-392d-01ec-e79b-e91919d1d783@linux.intel.com

ee648a74

drm/i915: Use atomic state and connector_type in i915_sink_src · 26c17cf6

由 Maarten Lankhorst 提交于 6月 20, 2016

DPMS is unreliable, use crtc->state.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431059-8919-4-git-send-email-maarten.lankhorst@linux.intel.comReviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>

26c17cf6

drm/i915: Use connector_type instead of intel_encoder->type for DP. · b6dabe3b

由 Maarten Lankhorst 提交于 6月 20, 2016

Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431059-8919-3-git-send-email-maarten.lankhorst@linux.intel.comReviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>

b6dabe3b

drm/i915: Use connector->name in drrs debugfs. · 26875fe5

由 Maarten Lankhorst 提交于 6月 20, 2016

This removes relying on intel_encoder->type, which may be set to
unknown.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466431059-8919-2-git-send-email-maarten.lankhorst@linux.intel.comReviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>

26875fe5

drm/i915/guc: index host arrays by i915 engine ID, not guc_id · 0b63bb14

由 Dave Gordon 提交于 6月 20, 2016

The ONLY places that guc_id (aka hw_id) should be used are those where
the value or address is determined by and shared with the GuC firmware;
specifically, when filling in the GuC-context-descriptor or the GuC
addon data, or putting an entry in the GuC's work queue.

It need not (and therefore should not) be used to index GuC statistics
or similar host-managed tracking data. In particular, i915_guc_submit()
produces (and debugfs decodes) GuC submission statistics which should be
indexed by driver-engine-id rather then guc-engine-id.
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466432287-5799-1-git-send-email-david.s.gordon@intel.com

0b63bb14

14 6月, 2016 3 次提交

drm/i915/guc: add doorbell map to debugfs/i915_guc_info · 9636f6db

由 Dave Gordon 提交于 6月 13, 2016

To properly verify the driver->doorbell->GuC functionality, validation
needs to know how the driver has assigned the doorbell cache lines and
registers, so make them visible through debugfs.

v2: use kernel bitmap-printing format (%pb) rather than %x.
Signed-off-by: NDave Gordon <david.s.gordon@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>

9636f6db

drm/i915:bxt: Enable Pooled EU support · 33e141ed

由 arun.siluvery@linux.intel.com 提交于 6月 03, 2016

This mode allows to assign EUs to pools which can process work collectively.
The command to enable this mode should be issued as part of context initialization.

The pooled mode is global, once enabled it has to stay the same across all
contexts until HW reset hence this is sent in auxiliary golden context batch.
Thanks to Mika for the preliminary review and comments.

v2: explain why this is enabled in golden context, use feature flag while
enabling the support (Chris)

v3: Include only kernel support as userspace support is not available yet.

User space clients need to know when the pooled EU feature is present
and enabled on the hardware so that they can adapt work submissions.
Create a new device info flag for this purpose.

Set has_pooled_eu to true in the Broxton static device info - Broxton
supports the feature in hardware and the driver will enable it by
default.

We need to add getparam ioctls to enable userspace to query availability of
this feature and to retrieve min. no of eus in a pool but we will expose
them once userspace support is available. Opensource users for this feature
are mesa, libva and beignet.

Beignet team is currently working on adding userspace support.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Cc: Winiarski, Michal <michal.winiarski@intel.com>
Cc: Zou, Nanhai <nanhai.zou@intel.com>
Cc: Yang, Rong R <rong.r.yang@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Armin Reese <armin.c.reese@intel.com>
Cc: Tim Gore <tim.gore@intel.com>
Signed-off-by: NJeff McGee <jeff.mcgee@intel.com>
Signed-off-by: NArun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: NMichał Winiarski <michal.winiarski@intel.com>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>

33e141ed

drm/i915: Fix missing unlock on error in i915_ppgtt_info() · b0212486

由 Wei Yongjun 提交于 6月 13, 2016

Add the missing unlock before return from function i915_ppgtt_info()
in the error handling case.

Fixes: 1d2ac403(drm: Protect dev->filelist with its own mutex)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1465861320-26221-1-git-send-email-weiyj_lk@163.com

b0212486