提交 · d1844606fd63af1994e153a8607d4b13709bc395 · openeuler / Kernel

04 7月, 2019 1 次提交

drm/i915: Show support for accurate sw PMU busyness tracking · bf73fc0f

由 Chris Wilson 提交于 7月 03, 2019

Expose whether or not we support the PMU software tracking in our
scheduler capabilities, so userspace can query at runtime.

v2: Use I915_SCHEDULER_CAP_ENGINE_BUSY_STATS for a less ambiguous
capability name.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703143702.11339-1-chris@chris-wilson.co.uk

bf73fc0f

22 5月, 2019 9 次提交

drm/i915: Engine discovery query · c5d3e39c

由 Tvrtko Ursulin 提交于 5月 22, 2019

Engine discovery query allows userspace to enumerate engines, probe their
configuration features, all without needing to maintain the internal PCI
ID based database.

A new query for the generic i915 query ioctl is added named
DRM_I915_QUERY_ENGINE_INFO, together with accompanying structure
drm_i915_query_engine_info. The address of latter should be passed to the
kernel in the query.data_ptr field, and should be large enough for the
kernel to fill out all known engines as struct drm_i915_engine_info
elements trailing the query.

As with other queries, setting the item query length to zero allows
userspace to query minimum required buffer size.

Enumerated engines have common type mask which can be used to query all
hardware engines, versus engines userspace can submit to using the execbuf
uAPI.

Engines also have capabilities which are per engine class namespace of
bits describing features not present on all engine instances.

v2:
 * Fixed HEVC assignment.
 * Reorder some fields, rename type to flags, increase width. (Lionel)
 * No need to allocate temporary storage if we do it engine by engine.
   (Lionel)

v3:
 * Describe engine flags and mark mbz fields. (Lionel)
 * HEVC only applies to VCS.

v4:
 * Squash SFC flag into main patch.
 * Tidy some comments.

v5:
 * Add uabi_ prefix to engine capabilities. (Chris Wilson)
 * Report exact size of engine info array. (Chris Wilson)
 * Drop the engine flags. (Joonas Lahtinen)
 * Added some more reserved fields.
 * Move flags after class/instance.

v6:
 * Do not check engine info array was zeroed by userspace but zero the
   unused fields for them instead.

v7:
 * Simplify length calculation loop. (Lionel Landwerlin)

v8:
 * Remove MBZ comments where not applicable.
 * Rename ABI flags to match engine class define naming.
 * Rename SFC ABI flag to reflect it applies to VCS and VECS.
 * SFC is wired to even _logical_ engine instances.
 * SFC applies to VCS and VECS.
 * HEVC is present on all instances on Gen11. (Tony)
 * Simplify length calculation even more. (Chris Wilson)
 * Move info_ptr assigment closer to loop for clarity. (Chris Wilson)
 * Use vdbox_sfc_access from runtime info.
 * Rebase for RUNTIME_INFO.
 * Refactor for lower indentation.
 * Rename uAPI class/instance to engine_class/instance to avoid C++
   keyword.

v9:
 * Rebase for s/num_rings/num_engines/ in RUNTIME_INFO.

v10:
 * Use new copy_query_item.

v11:
 * Consolidate with struct i915_engine_class_instnace.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> # v7
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190522090054.6007-1-tvrtko.ursulin@linux.intel.com

c5d3e39c

drm/i915: Allow specification of parallel execbuf · a88b6e4c

由 Chris Wilson 提交于 5月 21, 2019

There is a desire to split a task onto two engines and have them run at
the same time, e.g. scanline interleaving to spread the workload evenly.
Through the use of the out-fence from the first execbuf, we can
coordinate secondary execbuf to only become ready simultaneously with
the first, so that with all things idle the second execbufs are executed
in parallel with the first. The key difference here between the new
EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
waits for the completion of the first request (so that all of its
rendering results are visible to the second execbuf, the more common
userspace fence requirement).

Since we only have a single input fence slot, userspace cannot mix an
in-fence and a submit-fence. It has to use one or the other! This is not
such a harsh requirement, since by virtue of the submit-fence, the
secondary execbuf inherit all of the dependencies from the first
request, and for the application the dependencies should be common
between the primary and secondary execbuf.
Suggested-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_exec_fence/parallel
Link: https://github.com/intel/media-driver/pull/546Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-10-chris@chris-wilson.co.uk

a88b6e4c

drm/i915/execlists: Virtual engine bonding · ee113690

由 Chris Wilson 提交于 5月 21, 2019

Some users require that when a master batch is executed on one particular
engine, a companion batch is run simultaneously on a specific slave
engine. For this purpose, we introduce virtual engine bonding, allowing
maps of master:slaves to be constructed to constrain which physical
engines a virtual engine may select given a fence on a master engine.

For the moment, we continue to ignore the issue of preemption deferring
the master request for later. Ideally, we would like to then also remove
the slave and run something else rather than have it stall the pipeline.
With load balancing, we should be able to move workload around it, but
there is a similar stall on the master pipeline while it may wait for
the slave to be executed. At the cost of more latency for the bonded
request, it may be interesting to launch both on their engines in
lockstep. (Bubbles abound.)

Opens: Also what about bonding an engine as its own master? It doesn't
break anything internally, so allow the silliness.

v2: Emancipate the bonds
v3: Couple in delayed scheduling for the selftests
v4: Handle invalid mutually exclusive bonding
v5: Mention what the uapi does
v6: s/nbond/num_bonds/
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk

ee113690

drm/i915: Load balancing across a virtual engine · 6d06779e

由 Chris Wilson 提交于 5月 21, 2019

Having allowed the user to define a set of engines that they will want
to only use, we go one step further and allow them to bind those engines
into a single virtual instance. Submitting a batch to the virtual engine
will then forward it to any one of the set in a manner as best to
distribute load.  The virtual engine has a single timeline across all
engines (it operates as a single queue), so it is not able to concurrently
run batches across multiple engines by itself; that is left up to the user
to submit multiple concurrent batches to multiple queues. Multiple users
will be load balanced across the system.

The mechanism used for load balancing in this patch is a late greedy
balancer. When a request is ready for execution, it is added to each
engine's queue, and when an engine is ready for its next request it
claims it from the virtual engine. The first engine to do so, wins, i.e.
the request is executed at the earliest opportunity (idle moment) in the
system.

As not all HW is created equal, the user is still able to skip the
virtual engine and execute the batch on a specific engine, all within the
same queue. It will then be executed in order on the correct engine,
with execution on other virtual engines being moved away due to the load
detection.

A couple of areas for potential improvement left!

- The virtual engine always take priority over equal-priority tasks.
Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
and hopefully the virtual and real engines are not then congested (i.e.
all work is via virtual engines, or all work is to the real engine).

- We require the breadcrumb irq around every virtual engine request. For
normal engines, we eliminate the need for the slow round trip via
interrupt by using the submit fence and queueing in order. For virtual
engines, we have to allow any job to transfer to a new ring, and cannot
coalesce the submissions, so require the completion fence instead,
forcing the persistent use of interrupts.

- We only drip feed single requests through each virtual engine and onto
the physical engines, even if there was enough work to fill all ELSP,
leaving small stalls with an idle CS event at the end of every request.
Could we be greedy and fill both slots? Being lazy is virtuous for load
distribution on less-than-full workloads though.

Other areas of improvement are more general, such as reducing lock
contention, reducing dispatch overhead, looking at direct submission
rather than bouncing around tasklets etc.

sseu: Lift the restriction to allow sseu to be reconfigured on virtual
engines composed of RENDER_CLASS (rcs).

v2: macroize check_user_mbz()
v3: Cancel virtual engines on wedging
v4: Commence commenting
v5: Replace 64b sibling_mask with a list of class:instance
v6: Drop the one-element array in the uabi
v7: Assert it is an virtual engine in to_virtual_engine()
v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2)

Link: https://github.com/intel/media-driver/pull/283Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk

6d06779e

drm/i915: Allow userspace to clone contexts on creation · b81dde71

由 Chris Wilson 提交于 5月 21, 2019

A usecase arose out of handling context recovery in mesa, whereby they
wish to recreate a context with fresh logical state but preserving all
other details of the original. Currently, they create a new context and
iterate over which bits they want to copy across, but it would much more
convenient if they were able to just pass in a target context to clone
during creation. This essentially extends the setparam during creation
to pull the details from a target context instead of the user supplied
parameters.

The ideal here is that we don't expose control over anything more than
can be obtained via CONTEXT_PARAM. That is userspace retains explicit
control over all features, and this api is just convenience.

For example, you could replace

	struct context_param p = { .param = CONTEXT_PARAM_VM };

	param.ctx_id = old_id;
	gem_context_get_param(&p.param);

	new_id = gem_context_create();

	param.ctx_id = new_id;
	gem_context_set_param(&p.param);

	gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */

with

	struct create_ext_param p = {
	  { .name = CONTEXT_CREATE_CLONE },
	  .clone_id = old_id,
	  .flags = CLONE_FLAGS_VM
	}
	new_id = gem_context_create_ext(&p);

and not have to worry about stray namespace pollution etc.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-5-chris@chris-wilson.co.uk

b81dde71

drm/i915: Re-expose SINGLE_TIMELINE flags for context creation · 8319f44c

由 Chris Wilson 提交于 5月 21, 2019

The SINGLE_TIMELINE flag can be used to create a context such that all
engine instances within that context share a common timeline. This can
be useful for mixing operations between real and virtual engines, or
when using a composite context for a single client API context.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-4-chris@chris-wilson.co.uk

8319f44c

drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] · e620f7b3

由 Chris Wilson 提交于 5月 21, 2019

Allow the user to specify a local engine index (as opposed to
class:index) that they can use to refer to a preset engine inside the
ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
This will be useful for setting SSEU parameters on virtual engines that
are local to the context and do not have a valid global class:instance
lookup.

Note that due to the ambiguity in using class:instance with
ctx->engines[], if a user supplied engine map is active the user must
specify the engine to alter by its index into the ctx->engines[].
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-3-chris@chris-wilson.co.uk

e620f7b3

drm/i915: Allow a context to define its set of engines · 976b55f0

由 Chris Wilson 提交于 5月 21, 2019

Over the last few years, we have debated how to extend the user API to
support an increase in the number of engines, that may be sparse and
even be heterogeneous within a class (not all video decoders created
equal). We settled on using (class, instance) tuples to identify a
specific engine, with an API for the user to construct a map of engines
to capabilities. Into this picture, we then add a challenge of virtual
engines; one user engine that maps behind the scenes to any number of
physical engines. To keep it general, we want the user to have full
control over that mapping. To that end, we allow the user to constrain a
context to define the set of engines that it can access, order fully
controlled by the user via (class, instance). With such precise control
in context setup, we can continue to use the existing execbuf uABI of
specifying a single index; only now it doesn't automagically map onto
the engines, it uses the user defined engine map from the context.

v2: Fixup freeing of local on success of get_engines()
v3: Allow empty engines[]
v4: s/nengine/num_engines/
v5: Replace 64 limit on num_engines with a note that execbuf is
currently limited to only using the first 64 engines.
v6: Actually use the engines_mutex to guard the ctx->engines.

Testcase: igt/gem_ctx_engines
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk

976b55f0

drm/i915: Restore control over ppgtt for context creation ABI · 7f3f317a

由 Chris Wilson 提交于 5月 21, 2019

Having hid the partially exposed new ABI from the PR, put it back again
for completion of context recovery. A significant part of context
recovery is the ability to reuse as much of the old context as is
feasible (to avoid expensive reconstruction). The biggest chunk kept
hidden at the moment is fine-control over the ctx->ppgtt (the GPU page
tables and associated translation tables and kernel maps), so make
control over the ctx->ppgtt explicit.

This allows userspace to create and share virtual memory address spaces
(within the limits of a single fd) between contexts they own, along with
the ability to query the contexts for the vm state.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-1-chris@chris-wilson.co.uk

7f3f317a

17 4月, 2019 1 次提交

drm/i915: Introduce struct class_instance for engines across the uAPI · d1172ab3

由 Chris Wilson 提交于 4月 12, 2019

SSEU reprogramming of the context introduced the notion of engine class
and instance for a forwards compatible method of describing any engine
beyond the old execbuf interface. We wish to adopt this class:instance
description for more interfaces, so pull it out into a separate type for
userspace convenience.

Fixes: e46c2e99 ("drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Cc: Andi Shyti <andi@etezian.org>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: NTony Ye <tony.ye@intel.com>
Reviewed-by: NAndi Shyti <andi@etezian.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190412071416.30097-1-chris@chris-wilson.co.uk

d1172ab3

27 3月, 2019 1 次提交

drm/i915: Drop new chunks of context creation ABI (for now) · 96fd2c66

由 Chris Wilson 提交于 3月 27, 2019

The intent was to expose these as part of the means to perform full
context recovery (though not the SINGLE_TIMELINE, that is for later and
just sucked as collateral damage). As that requires a couple more
patches to complete the series, roll back the earlier chunks of ABI for
an intervening PR. We keep all the internals intact and under selftests.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327105814.14694-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>

96fd2c66

22 3月, 2019 4 次提交

drm/i915: Allow contexts to share a single timeline across all engines · ea593dbb

由 Chris Wilson 提交于 3月 22, 2019

Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

v2: Move the specialised timeline ordering to its own function.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk

ea593dbb

drm/i915: Extend CONTEXT_CREATE to set parameters upon construction · b9171541

由 Chris Wilson 提交于 3月 22, 2019

It can be useful to have a single ioctl to create a context with all
the initial parameters instead of a series of create + setparam + setparam
ioctls. This extension to create context allows any of the parameters
to be passed in as a linked list to be applied to the newly constructed
context.

v2: Make a local copy of user setparam (Tvrtko)
v3: Use flags to detect availability of extension interface
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-3-chris@chris-wilson.co.uk

b9171541

drm/i915: Create/destroy VM (ppGTT) for use with contexts · e0695db7

由 Chris Wilson 提交于 3月 22, 2019

In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.

v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit
v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
similarly, turned out different!)

v5: Fix up test unwind for aliasing-ppgtt (snb)
v6: Tighten language for uapi struct drm_i915_gem_vm_control.
v7: Patch the context image for runtime ppgtt switching!

Testcase: igt/gem_vm_create
Testcase: igt/gem_ctx_param/vm
Testcase: igt/gem_ctx_clone/vm
Testcase: igt/gem_ctx_shared
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-2-chris@chris-wilson.co.uk

e0695db7

drm/i915: Introduce the i915_user_extension_method · 9d1305ef

由 Chris Wilson 提交于 3月 22, 2019

An idea for extending uABI inspired by Vulkan's extension chains.
Instead of expanding the data struct for each ioctl every time we need
to add a new feature, define an extension chain instead. As we add
optional interfaces to control the ioctl, we define a new extension
struct that can be linked into the ioctl data only when required by the
user. The key advantage being able to ignore large control structs for
optional interfaces/extensions, while being able to process them in a
consistent manner.

In comparison to other extensible ioctls, the key difference is the
use of a linked chain of extension structs vs an array of tagged
pointers. For example,

struct drm_amdgpu_cs_chunk {
        __u32           chunk_id;
        __u32           length_dw;
        __u64           chunk_data;
};

struct drm_amdgpu_cs_in {
        __u32           ctx_id;
        __u32           bo_list_handle;
        __u32           num_chunks;
        __u32           _pad;
        __u64           chunks;
};

allows userspace to pass in array of pointers to extension structs, but
must therefore keep constructing that array along side the command stream.
In dynamic situations like that, a linked list is preferred and does not
similar from extra cache line misses as the extension structs themselves
must still be loaded separate to the chunks array.

v2: Apply the tail call optimisation directly to nip the worry of stack
overflow in the bud.
v3: Defend against recursion.
v4: Fixup local types to match new uabi

Opens:
- do we include the result as an out-field in each chain?
struct i915_user_extension {
	__u64 next_extension;
	__u64 name;
	__s32 result;
	__u32 mbz; /* reserved for future use */
};
* Undecided, so provision some room for future expansion.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-1-chris@chris-wilson.co.uk

9d1305ef

06 3月, 2019 1 次提交

drm/i915: Remove last traces of exec-id (GEM_BUSY) · c8b50242

由 Chris Wilson 提交于 3月 05, 2019

As we allow per-context engine allows the legacy concept of
I915_EXEC_RING no longer applies universally. We are still exposing the
unrelated exec-id in GEM_BUSY, so transition this ioctl (once more
slightly changing its ABI, but no one cares) over to only reporting the
uabi-class (not instance as we can not foreseeably fit those into the
small bitmask).

The only user of the extended ring information from GEM_BUSY is ddx/sna,
which tries to use the non-rcs business information to guide which
engine to use for subsequent operations on foreign bo. All that matters
for it is the decision between rcs and !rcs, so it is unaffected by the
change in higher bits.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305162643.20243-1-chris@chris-wilson.co.uk

c8b50242

02 3月, 2019 2 次提交

drm/i915: Fix I915_EXEC_RING_MASK · d90c06d5

由 Chris Wilson 提交于 3月 01, 2019

This was supposed to be a mask of all known rings, but it is being used
by execbuffer to filter out invalid rings, and so is instead mapping high
unused values onto valid rings. Instead of a mask of all known rings,
we need it to be the mask of all possible rings.

Fixes: 549f7365 ("drm/i915: Enable SandyBridge blitter ring")
Fixes: de1add36 ("drm/i915: Decouple execbuf uAPI from internal implementation")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-21-chris@chris-wilson.co.uk

d90c06d5

drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ · e8861964

由 Chris Wilson 提交于 3月 01, 2019

Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the 3x reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement (can be up to ~10%, most see no change)
for single clients that utilize multiple engines (typically media players
and transcoders), without regressing multiple clients that can saturate
the system or changing the power envelope dramatically.

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Testcase: igt/gem_exec_whisper
Testcase: igt/benchmarks/gem_wsim
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk

e8861964

19 2月, 2019 1 次提交

drm/i915: Include reminders about leaving no holes in uAPI enums · be03564b

由 Chris Wilson 提交于 2月 18, 2019

We don't want to pre-reserve any holes in our uAPI for that is a sign of
nefarious and hidden activity. Add a reminder about our uAPI
expectations to encourage good practice when adding new defines/enums.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190218094628.13522-1-chris@chris-wilson.co.uk

be03564b

18 2月, 2019 1 次提交

drm/i915: Optionally disable automatic recovery after a GPU reset · ba4fda62

由 Chris Wilson 提交于 2月 18, 2019

Some clients, such as mesa, may only emit minimal incremental batches
that rely on the logical context state from previous batches. They know
that recovery is impossible after a hang as their required GPU state is
lost, and that each in flight and subsequent batch will hang (resetting
the context image back to default perpetuating the problem).

To avoid getting into the state in the first place, we can allow clients
to opt out of automatic recovery and elect to ban any guilty context
following a hang. This prevents the continual stream of hangs and allows
the client to recreate their context and rebuild the state from scratch.

v2: Prefer calling it recoverable rather than unrecoverable.

References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215431.htmlSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> # for mesa
Link: https://patchwork.freedesktop.org/patch/msgid/20190218105821.17293-1-chris@chris-wilson.co.uk

ba4fda62

05 2月, 2019 1 次提交

drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only) · e46c2e99

由 Tvrtko Ursulin 提交于 2月 05, 2019

We want to allow userspace to reconfigure the subslice configuration on a
per context basis.

This is required for the functional requirement of shutting down non-VME
enabled sub-slices on Gen11 parts.

To do so, we expose a context parameter to allow adjustment of the RPCS
register stored within the context image (and currently not accessible via
LRI).

If the context is adjusted before first use or whilst idle, the adjustment
is for "free"; otherwise if the context is active we queue a request to do
so (using the kernel context), following all other activity by that
context, which is also marked as barrier for all following submission
against the same context.

Since the overhead of device re-configuration during context switching can
be significant, especially in multi-context workloads, we limit this new
uAPI to only support the Gen11 VME use case. In this use case either the
device is fully enabled, and exactly one slice and half of the subslices
are enabled.

Example usage:

	struct drm_i915_gem_context_param_sseu sseu = { };
	struct drm_i915_gem_context_param arg = {
		.param = I915_CONTEXT_PARAM_SSEU,
		.ctx_id = gem_context_create(fd),
		.size = sizeof(sseu),
		.value = to_user_pointer(&sseu)
	};

	/* Query device defaults. */
	gem_context_get_param(fd, &arg);

	/* Set VME configuration on a 1x6x8 part. */
	sseu.slice_mask = 0x1;
	sseu.subslice_mask = 0xe0;
	gem_context_set_param(fd, &arg);

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu()
    (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context
    (Chris)

v7: Synchronize the requests following a powergating setting change using
    a global dependency (Chris)
    Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
    Disable RPCS configuration setting for non capable users
    (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
    s/dev_priv/i915/ (Tvrtko)
    Change uapi class/instance fields to u16 (Tvrtko)
    Bump mask fields to 64bits (Lionel)
    Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
    reconfiguring powergated slice/subslices (Chris)
    Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
 * Update for upstream changes.
 * Request submit needs a RPM reference.
 * Reject on !FULL_PPGTT for simplicity.
 * Pull out get/set param to helpers for readability and less indent.
 * Use i915_request_await_dma_fence in add_global_barrier to skip waits
   on the same timeline and avoid GEM_BUG_ON.
 * No need to explicitly assign a NULL pointer to engine in legacy mode.
 * No need to move gen8_make_rpcs up.
 * Factored out global barrier as prep patch.
 * Allow to only CAP_SYS_ADMIN if !Gen11.

v11:
 * Remove engine vfunc in favour of local helper. (Chris Wilson)
 * Stop retiring requests before updates since it is not needed
   (Chris Wilson)
 * Implement direct CPU update path for idle contexts. (Chris Wilson)
 * Left side dependency needs only be on the same context timeline.
   (Chris Wilson)
 * It is sufficient to order the timeline. (Chris Wilson)
 * Reject !RCS configuration attempts with -ENODEV for now.

v12:
 * Rebase for make_rpcs.

v13:
 * Centralize SSEU normalization to make_rpcs.
 * Type width checking (uAPI <-> implementation).
 * Gen11 restrictions uAPI checks.
 * Gen11 subslice count differences handling.
 Chris Wilson:
 * args->size handling fixes.
 * Update context image from GGTT.
 * Postpone context image update to pinning.
 * Use i915_gem_active_raw instead of last_request_on_engine.

v14:
 * Add activity tracker on intel_context to fix the lifetime issues
   and simplify the code. (Chris Wilson)

v15:
 * Fix context pin leak if no space in ring by simplifying the
   context pinning sequence.

v16:
 * Rebase for context get/set param locking changes.
 * Just -ENODEV on !Gen11. (Joonas)

v17:
 * Fix one Gen11 subslice enablement rule.
 * Handle error from i915_sw_fence_await_sw_fence_gfp. (Chris Wilson)

v18:
 * Update commit message. (Joonas)
 * Restrict uAPI to VME use case. (Joonas)

v19:
 * Rebase.

v20:
 * Rebase for ce->active_tracker.

v21:
 * Rebase for IS_GEN changes.

v22:
 * Reserve uAPI for flags straight away. (Chris Wilson)

v23:
 * Rebase for RUNTIME_INFO.

v24:
 * Added some headline docs for the uapi usage. (Joonas/Chris)

v25:
 * Renamed class/instance to engine_class/engine_instance to avoid clash
   with C++ keyword. (Tony Ye)

v26:
 * Rebased for runtime pm api changes.

v27:
 * Rebased for intel_context_init.
 * Wrap commit msg to 75.

v28:
 (Chris Wilson)
 * Use i915_gem_ggtt.
 * Use i915_request_await_dma_fence to show a better example.

v29:
 * i915_timeline_set_barrier can now fail. (Chris Wilson)

v30:
 * Capture some acks.

v31:
 * Drop the WARN_ON from use controllable paths. (Chris Wilson)
 * Use overflows_type for all checks.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107634
Issue: https://github.com/intel/media-driver/issues/267Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Zhipeng Gong <zhipeng.gong@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: NTimo Aaltonen <timo.aaltonen@canonical.com>
Acked-by: NTakashi Iwai <tiwai@suse.de>
Acked-by: NStéphane Marchesin <marcheu@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-4-tvrtko.ursulin@linux.intel.com

e46c2e99

19 11月, 2018 1 次提交

Revert "drm/i915/perf: add a parameter to control the size of OA buffer" · fe841686

由 Joonas Lahtinen 提交于 11月 16, 2018

Userspace portion is still missing.

This reverts commit cd956bfc.

Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: NMatthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181116135510.13807-1-joonas.lahtinen@linux.intel.com

fe841686

23 10月, 2018 1 次提交

drm/i915/perf: add a parameter to control the size of OA buffer · cd956bfc

由 Lionel Landwerlin 提交于 10月 23, 2018

The way our hardware is designed doesn't seem to let us use the
MI_RECORD_PERF_COUNT command without setting up a circular buffer.

In the case where the user didn't request OA reports to be available
through the i915 perf stream, we can set the OA buffer to the minimum
size to avoid consuming memory which won't be used by the driver.

v2: Simplify oa buffer size exponent selection (Chris)
Reuse vma size field (Lionel)

v3: Restrict size opening parameter to values supported by HW (Chris)

v4: Drop out of date comment (Matt)
Add debug message when buffer size is rejected (Matt)
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NMatthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-5-lionel.g.landwerlin@intel.com

cd956bfc

27 9月, 2018 1 次提交

drm/i915: Remove i915.enable_ppgtt override · 4bdafb9d

由 Chris Wilson 提交于 9月 26, 2018

Now that we are confident in providing full-ppgtt where supported,
remove the ability to override the context isolation.

v2: Remove faked aliasing-ppgtt for testing as it no longer is accepted.
v3: s/USES/HAS/ to match usage and reject attempts to load the module on
old GVT-g setups that do not provide support for full-ppgtt.
v4: Insulate ABI ppGTT values from our internal enum (later plans
involve moving ppGTT depth out of the enum, thus potentially breaking
ABI unless we document the current values).
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: NZhi Wang <zhi.a.wang@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926201222.5643-1-chris@chris-wilson.co.uk

4bdafb9d

20 7月, 2018 1 次提交

drm/i915: Only force GGTT coherency w/a on required chipsets · 900ccf30

由 Chris Wilson 提交于 7月 20, 2018

Not all chipsets have an internal buffer delaying the visibility of
writes via the GGTT being visible by other physical paths, but we use a
very heavy workaround for all. We only need to apply that workarounds to
the chipsets we know suffer from the delay and the resulting coherency
issue.

Similarly, the same inconsistent coherency fouls up our ABI promise that
a write into a mmap_gtt is immediately visible to others. Since the HW
has made that a lie, let userspace know when that contract is broken.
(Not that userspace would want to use mmap_gtt on those chipsets for
other performance reasons...)

Testcase: igt/drv_selftest/live_coherency
Testcase: igt/gem_mmap_gtt/coherency
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100587Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTomasz Lis <tomasz.lis@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180720101910.11153-1-chris@chris-wilson.co.uk

900ccf30

08 3月, 2018 2 次提交

drm/i915: expose rcs topology through query uAPI · c822e059

由 Lionel Landwerlin 提交于 3月 06, 2018

With the introduction of asymmetric slices in CNL, we cannot rely on
the previous SUBSLICE_MASK getparam to tell userspace what subslices
are available. Here we introduce a more detailed way of querying the
Gen's GPU topology that doesn't aggregate numbers.

This is essential for monitoring parts of the GPU with the OA unit,
because counters need to be normalized to the number of
EUs/subslices/slices. The current aggregated numbers like EU_TOTAL do
not gives us sufficient information.

The Mesa series making use of this API is :

    https://patchwork.freedesktop.org/series/38795/

As a bonus we can draw representations of the GPU :

    https://imgur.com/a/vuqpa

v2: Rename uapi struct s/_mask/_info/ (Tvrtko)
    Report max_slice/subslice/eus_per_subslice rather than strides (Tvrtko)
    Add uapi macros to read data from *_info structs (Tvrtko)

v3: Use !!(v & DRM_I915_BIT()) for uapi macros instead of custom shifts (Tvrtko)

v4: factorize query item writting (Tvrtko)
    tweak uapi struct/define names (Tvrtko)

v5: Replace ALIGN() macro (Chris)

v6: Updated uapi comments (Tvrtko)
    Moved flags != 0 checks into vfuncs (Tvrtko)

v7: Use access_ok() before copying anything, to avoid overflows (Chris)
    Switch BUG_ON() to GEM_WARN_ON() (Tvrtko)

v8: Tweak uapi comments style to match the coding style (Lionel)

v9: Fix error in comment about computation of enabled subslice (Tvrtko)

v10: Fix/update comments in uAPI (Sagar)

v11: Drop drm_i915_query_(slice|subslice|eu)_info in favor of a single
     drm_i915_query_topology_info (Joonas)

v12: Add subslice_stride/eu_stride in drm_i915_query_topology_info (Joonas)

v13: Fix comment in uAPI (Joonas)
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-7-lionel.g.landwerlin@intel.com

c822e059

drm/i915: add query uAPI · a446ae2c

由 Lionel Landwerlin 提交于 3月 06, 2018

There are a number of information that are readable from hardware
registers and that we would like to make accessible to userspace. One
particular example is the topology of the execution units (how are
execution units grouped in subslices and slices and also which ones
have been fused off for die recovery).

At the moment the GET_PARAM ioctl covers some basic needs, but
generally is only able to return a single value for each defined
parameter. This is a bit problematic with topology descriptions which
are array/maps of available units.

This change introduces a new ioctl that can deal with requests to fill
structures of potentially variable lengths. The user is expected fill
a query with length fields set at 0 on the first call, the kernel then
sets the length fields to the their expected values. A second call to
the kernel with length fields at their expected values will trigger a
copy of the data to the pointed memory locations.

The scope of this uAPI is only to provide information to userspace,
not to allow configuration of the device.

v2: Simplify dispatcher code iteration (Tvrtko)
    Tweak uapi drm_i915_query_item structure (Tvrtko)

v3: Rename pad fields into flags (Chris)
    Return error on flags field != 0 (Chris)
    Only copy length back to userspace in drm_i915_query_item (Chris)

v4: Use array of functions instead of switch (Chris)

v5: More comments in uapi (Tvrtko)
    Return query item errors in length field (All)

v6: Tweak uapi comments style to match the coding style (Lionel)

v7: Add i915_query.h (Joonas)

v8: (Lionel) Change the behavior of the item iterator to report
    invalid queries into the query item rather than stopping the
    iteration. This enables userspace applications to query newer
    items on older kernels and only have failure on the items that are
    not supported.

v9: Edit copyright headers (Joonas)

v10: Typos & comments in uapi (Joonas)
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-6-lionel.g.landwerlin@intel.com

a446ae2c

06 2月, 2018 1 次提交

drm/i915: Deprecate I915_SET_COLORKEY_NONE · 6ec5bd34

由 Ville Syrjälä 提交于 2月 02, 2018

Deprecate the silly I915_SET_COLORKEY_NONE flag. The obvious
way to disable colorkey is to just set flags to 0, which is
exactly what the intel ddx has been doing all along.

Currently when userspace sets the flags to 0, we end up in a
funny state where colorkey is disabled, but various colorkey
vs. scaling checks still consider colorkey to be enabled, and
thus we don't allow plane scaling to kick in.

In case there is some other userspace out there that actually
uses this flag (unlikely as this is an i915 specific uapi)
we'll keep on accepting it.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180202204231.27905-1-ville.syrjala@linux.intel.comReviewed-by: NChris Wilson <chris@chris-wilson.co.uk>

6ec5bd34

25 11月, 2017 1 次提交

drm/i915/pmu: Aggregate all RC6 states into one counter · 3452fa30

由 Tvrtko Ursulin 提交于 11月 24, 2017

Chris has discovered that RC6, RC6p and RC6pp counters are mutually
exclusive, and even that on some SNB SKUs you get RC6p increasing, and on
the others RC6.

Furthermore RC6p and RC6pp were only present starting from GEN6 until,
GEN7, not including Haswell.

All this combined makes it questionable whether we need to reserve new ABI
for these counters. One idea was to just combine them all under the RC6
counter to simplify things for userspace. So that is what this patch does.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171124171331.17981-1-tvrtko.ursulin@linux.intel.com

3452fa30

23 11月, 2017 1 次提交

drm/i915/pmu: Drop I915_ENGINE_SAMPLE_MAX from uapi headers · b552ae44

由 Tvrtko Ursulin 提交于 11月 23, 2017

We have agreed during the engine classes discussion that fields marked as
non-ABI are better left out altogether from uapi headers.

v2: Use a local define for maintanability. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123100701.18430-1-tvrtko.ursulin@linux.intel.com

b552ae44

22 11月, 2017 3 次提交

drm/i915/pmu: Add RC6 residency metrics · 6060b6ae

由 Tvrtko Ursulin 提交于 11月 21, 2017

For clients like intel-gpu-overlay it is easier to read the
counters via the perf API than having to parse sysfs.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-9-tvrtko.ursulin@linux.intel.com

6060b6ae

drm/i915/pmu: Add interrupt count metric · 0cd4684d

由 Tvrtko Ursulin 提交于 11月 21, 2017

For clients like intel-gpu-overlay it is easier to read the
count via the perf API than having to parse /proc.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-7-tvrtko.ursulin@linux.intel.com

0cd4684d

drm/i915/pmu: Expose a PMU interface for perf queries · b46a33e2

由 Tvrtko Ursulin 提交于 11月 21, 2017

From: Chris Wilson <chris@chris-wilson.co.uk>
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
From: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

The first goal is to be able to measure GPU (and invidual ring) busyness
without having to poll registers from userspace. (Which not only incurs
holding the forcewake lock indefinitely, perturbing the system, but also
runs the risk of hanging the machine.) As an alternative we can use the
perf event counter interface to sample the ring registers periodically
and send those results to userspace.

Functionality we are exporting to userspace is via the existing perf PMU
API and can be exercised via the existing tools. For example:

  perf stat -a -e i915/rcs0-busy/ -I 1000

Will print the render engine busynnes once per second. All the performance
counters can be enumerated (perf list) and have their unit of measure
correctly reported in sysfs.

v1-v2 (Chris Wilson):

v2: Use a common timer for the ring sampling.

v3: (Tvrtko Ursulin)
 * Decouple uAPI from i915 engine ids.
 * Complete uAPI defines.
 * Refactor some code to helpers for clarity.
 * Skip sampling disabled engines.
 * Expose counters in sysfs.
 * Pass in fake regs to avoid null ptr deref in perf core.
 * Convert to class/instance uAPI.
 * Use shared driver code for rc6 residency, power and frequency.

v4: (Dmitry Rogozhkin)
 * Register PMU with .task_ctx_nr=perf_invalid_context
 * Expose cpumask for the PMU with the single CPU in the mask
 * Properly support pmu->stop(): it should call pmu->read()
 * Properly support pmu->del(): it should call stop(event, PERF_EF_UPDATE)
 * Introduce refcounting of event subscriptions.
 * Make pmu.busy_stats a refcounter to avoid busy stats going away
   with some deleted event.
 * Expose cpumask for i915 PMU to avoid multiple events creation of
   the same type followed by counter aggregation by perf-stat.
 * Track CPUs getting online/offline to migrate perf context. If (likely)
   cpumask will initially set CPU0, CONFIG_BOOTPARAM_HOTPLUG_CPU0 will be
   needed to see effect of CPU status tracking.
 * End result is that only global events are supported and perf stat
   works correctly.
 * Deny perf driver level sampling - it is prohibited for uncore PMU.

v5: (Tvrtko Ursulin)

 * Don't hardcode number of engine samplers.
 * Rewrite event ref-counting for correctness and simplicity.
 * Store initial counter value when starting already enabled events
   to correctly report values to all listeners.
 * Fix RC6 residency readout.
 * Comments, GPL header.

v6:
 * Add missing entry to v4 changelog.
 * Fix accounting in CPU hotplug case by copying the approach from
   arch/x86/events/intel/cstate.c. (Dmitry Rogozhkin)

v7:
 * Log failure message only on failure.
 * Remove CPU hotplug notification state on unregister.

v8:
 * Fix error unwind on failed registration.
 * Checkpatch cleanup.

v9:
 * Drop the energy metric, it is available via intel_rapl_perf.
   (Ville Syrjälä)
 * Use HAS_RC6(p). (Chris Wilson)
 * Handle unsupported non-engine events. (Dmitry Rogozhkin)
 * Rebase for intel_rc6_residency_ns needing caller managed
   runtime pm.
 * Drop HAS_RC6 checks from the read callback since creating those
   events will be rejected at init time already.
 * Add counter units to sysfs so perf stat output is nicer.
 * Cleanup the attribute tables for brevity and readability.

v10:
 * Fixed queued accounting.

v11:
 * Move intel_engine_lookup_user to intel_engine_cs.c
 * Commit update. (Joonas Lahtinen)

v12:
 * More accurate sampling. (Chris Wilson)
 * Store and report frequency in MHz for better usability from
   perf stat.
 * Removed metrics: queued, interrupts, rc6 counters.
 * Sample engine busyness based on seqno difference only
   for less MMIO (and forcewake) on all platforms. (Chris Wilson)

v13:
 * Comment spelling, use mul_u32_u32 to work around potential GCC
   issue and somne code alignment changes. (Chris Wilson)

v14:
 * Rebase.

v15:
 * Rebase for RPS refactoring.

v16:
 * Use the dynamic slot in the CPU hotplug state machine so that we are
   free to setup our state as multi-instance. Previously we were re-using
   the CPUHP_AP_PERF_X86_UNCORE_ONLINE slot which is neither used as
   multi-instance, nor owned by our driver to start with.
 * Register the CPU hotplug handlers after the PMU, otherwise the callback
   will get called before the PMU is initialized which can end up in
   perf_pmu_migrate_context with an un-initialized base.
 * Added workaround for a probable bug in cpuhp core.

v17:
 * Remove workaround for the cpuhp bug.

v18:
 * Rebase for drm_i915_gem_engine_class getting upstream before us.

v19:
 * Rebase. (trivial)
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NDmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-2-tvrtko.ursulin@linux.intel.com

b46a33e2

13 11月, 2017 1 次提交

drm/i915: expose command stream timestamp frequency to userspace · dab91783

由 Lionel Landwerlin 提交于 11月 10, 2017

We use to have this fixed per generation, but starting with CNL userspace
cannot tell just off the PCI ID. Let's make this information available. This
is particularly useful for performance monitoring where much of the
normalization work is done using those timestamps (this include pipeline
statistics in both GL & Vulkan as well as OA reports).

v2: Use variables for 24MHz/19.2MHz values (Ewelina)
Renamed function & coding style (Sagar)

v3: Fix frequency read on Broadwell (Sagar)
Fix missing divide by 4 on <= gen4 (Sagar)
Signed-off-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: NRafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: NSagar Arun Kamble <sagar.a.kamble@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-7-lionel.g.landwerlin@intel.com

dab91783

11 11月, 2017 2 次提交

drm/i915: Record the default hw state after reset upon load · d2b4b979

由 Chris Wilson 提交于 11月 10, 2017

Take a copy of the HW state after a reset upon module loading by
executing a context switch from a blank context to the kernel context,
thus saving the default hw state over the blank context image.
We can then use the default hw state to initialise any future context,
ensuring that each starts with the default view of hw state.

v2: Unmap our default state from the GTT after stealing it from the
context. This should stop us from accidentally overwriting it via the
GTT (and frees up some precious GTT space).

Testcase: igt/gem_ctx_isolation
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk

d2b4b979

drm/i915: Define an engine class enum for the uABI · 1803fcbc

由 Tvrtko Ursulin 提交于 11月 10, 2017

We want to be able to report back to userspace details about an engine's
class, and in return for userspace to be able to request actions
regarding certain classes of engines. To isolate the uABI from any
variations between hw generations, we define an abstract class for the
engines and internally map onto the hw.

v2: Remove MAX from the uABI; keep it internal if we need it, but don't
let userspace make the mistake of using it themselves.
v3: s/OTHER/INVALID/
  The use of OTHER is ill-defined, so remove it from the uABI as any
  future new type of engine can define a class to suit it. But keep a
  reserved value for an invalid class, so that we can always
  unambiguously express when something doesn't belong to the
  classification.
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v2
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-1-chris@chris-wilson.co.uk

1803fcbc

09 11月, 2017 1 次提交

drm/i915: Reject unknown syncobj flags · 40a48845

由 Tvrtko Ursulin 提交于 10月 31, 2017

We have to reject unknown flags for uAPI considerations, and also
because the curent implementation limits their i915 storage space
to two bits.

v2: (Chris Wilson)
 * Fix fail in ABI check.
 * Added unknown flags and BUILD_BUG_ON.

v3:
 * Use ARCH_KMALLOC_MINALIGN instead of alignof. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: cf6e7bac ("drm/i915: Add support for drm syncobjs")
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031102326.9738-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit ebcaa1ff)
Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>

40a48845

03 11月, 2017 1 次提交

drm/i915: Reject unknown syncobj flags · ebcaa1ff

由 Tvrtko Ursulin 提交于 10月 31, 2017

We have to reject unknown flags for uAPI considerations, and also
because the curent implementation limits their i915 storage space
to two bits.

v2: (Chris Wilson)
 * Fix fail in ABI check.
 * Added unknown flags and BUILD_BUG_ON.

v3:
 * Use ARCH_KMALLOC_MINALIGN instead of alignof. (Chris Wilson)
Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: cf6e7bac ("drm/i915: Add support for drm syncobjs")
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031102326.9738-1-tvrtko.ursulin@linux.intel.com

ebcaa1ff

06 10月, 2017 1 次提交

drm/i915: Don't use BIT() in UAPI section · 822a4b67

由 Joonas Lahtinen 提交于 10月 06, 2017

Lets not introduce BIT() macro requirement for UAPI for now.

Fixes: 3fd3a6ff ("drm/i915: Simplify i915_reg_read_ioctl")
Signed-off-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171006104559.17312-1-joonas.lahtinen@linux.intel.com

822a4b67

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功