提交 · b81dde719439c8f09bb61e742ed95bfc4b33946b · openeuler / Kernel

22 5月, 2019 5 次提交

drm/i915: Allow userspace to clone contexts on creation · b81dde71

由 Chris Wilson 提交于 5月 21, 2019

A usecase arose out of handling context recovery in mesa, whereby they
wish to recreate a context with fresh logical state but preserving all
other details of the original. Currently, they create a new context and
iterate over which bits they want to copy across, but it would much more
convenient if they were able to just pass in a target context to clone
during creation. This essentially extends the setparam during creation
to pull the details from a target context instead of the user supplied
parameters.

The ideal here is that we don't expose control over anything more than
can be obtained via CONTEXT_PARAM. That is userspace retains explicit
control over all features, and this api is just convenience.

For example, you could replace

	struct context_param p = { .param = CONTEXT_PARAM_VM };

	param.ctx_id = old_id;
	gem_context_get_param(&p.param);

	new_id = gem_context_create();

	param.ctx_id = new_id;
	gem_context_set_param(&p.param);

	gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */

with

	struct create_ext_param p = {
	  { .name = CONTEXT_CREATE_CLONE },
	  .clone_id = old_id,
	  .flags = CLONE_FLAGS_VM
	}
	new_id = gem_context_create_ext(&p);

and not have to worry about stray namespace pollution etc.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-5-chris@chris-wilson.co.uk

b81dde71

drm/i915: Re-expose SINGLE_TIMELINE flags for context creation · 8319f44c

由 Chris Wilson 提交于 5月 21, 2019

The SINGLE_TIMELINE flag can be used to create a context such that all
engine instances within that context share a common timeline. This can
be useful for mixing operations between real and virtual engines, or
when using a composite context for a single client API context.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-4-chris@chris-wilson.co.uk

8319f44c

drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] · e620f7b3

由 Chris Wilson 提交于 5月 21, 2019

Allow the user to specify a local engine index (as opposed to
class:index) that they can use to refer to a preset engine inside the
ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
This will be useful for setting SSEU parameters on virtual engines that
are local to the context and do not have a valid global class:instance
lookup.

Note that due to the ambiguity in using class:instance with
ctx->engines[], if a user supplied engine map is active the user must
specify the engine to alter by its index into the ctx->engines[].
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-3-chris@chris-wilson.co.uk

e620f7b3

drm/i915: Allow a context to define its set of engines · 976b55f0

由 Chris Wilson 提交于 5月 21, 2019

Over the last few years, we have debated how to extend the user API to
support an increase in the number of engines, that may be sparse and
even be heterogeneous within a class (not all video decoders created
equal). We settled on using (class, instance) tuples to identify a
specific engine, with an API for the user to construct a map of engines
to capabilities. Into this picture, we then add a challenge of virtual
engines; one user engine that maps behind the scenes to any number of
physical engines. To keep it general, we want the user to have full
control over that mapping. To that end, we allow the user to constrain a
context to define the set of engines that it can access, order fully
controlled by the user via (class, instance). With such precise control
in context setup, we can continue to use the existing execbuf uABI of
specifying a single index; only now it doesn't automagically map onto
the engines, it uses the user defined engine map from the context.

v2: Fixup freeing of local on success of get_engines()
v3: Allow empty engines[]
v4: s/nengine/num_engines/
v5: Replace 64 limit on num_engines with a note that execbuf is
currently limited to only using the first 64 engines.
v6: Actually use the engines_mutex to guard the ctx->engines.

Testcase: igt/gem_ctx_engines
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk

976b55f0

drm/i915: Restore control over ppgtt for context creation ABI · 7f3f317a

由 Chris Wilson 提交于 5月 21, 2019

Having hid the partially exposed new ABI from the PR, put it back again
for completion of context recovery. A significant part of context
recovery is the ability to reuse as much of the old context as is
feasible (to avoid expensive reconstruction). The biggest chunk kept
hidden at the moment is fine-control over the ctx->ppgtt (the GPU page
tables and associated translation tables and kernel maps), so make
control over the ctx->ppgtt explicit.

This allows userspace to create and share virtual memory address spaces
(within the limits of a single fd) between contexts they own, along with
the ability to query the contexts for the vm state.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-1-chris@chris-wilson.co.uk

7f3f317a

20 4月, 2019 3 次提交

drm/msm/gpu: Add submit queue queries · b0fb6604

由 Jordan Crouse 提交于 3月 22, 2019

Add the capability to query information from a submit queue.
The first available parameter is for querying the number of GPU faults
(hangs) that can be attributed to the queue.

This is useful for implementing context robustness. A user context can
regularly query the number of faults to see if it is responsible for any
and if so it can invalidate itself.

This is also helpful for testing by confirming to the user  driver if a
particular command stream caused a fault (or not as the case may be).
Signed-off-by: NJordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: NRob Clark <robdclark@chromium.org>

b0fb6604

drm/msm: add param to retrieve # of GPU faults (global) · 48dc4241

由 Rob Clark 提交于 4月 16, 2019

For KHR_robustness, userspace wants to know two things, the count of GPU
faults globally, and the count of faults attributed to a given context.
This patch providees the former, and the next patch provides the latter.
Signed-off-by: NRob Clark <robdclark@chromium.org>
Reviewed-by: NJordan Crouse <jcrouse@codeaurora.org>

48dc4241

drm/msm/gpu: add per-process pagetables param · d674c963

由 Rob Clark 提交于 4月 15, 2019

For now it always returns '0' (false), but once the iommu work is in
place to enable per-process pagetables we can update the value returned.

Userspace needs to know this to make an informed decision about exposing
KHR_robustness.
Signed-off-by: NRob Clark <robdclark@chromium.org>
Reviewed-by: NJordan Crouse <jcrouse@codeaurora.org>

d674c963

17 4月, 2019 1 次提交

drm/i915: Introduce struct class_instance for engines across the uAPI · d1172ab3

由 Chris Wilson 提交于 4月 12, 2019

SSEU reprogramming of the context introduced the notion of engine class
and instance for a forwards compatible method of describing any engine
beyond the old execbuf interface. We wish to adopt this class:instance
description for more interfaces, so pull it out into a separate type for
userspace convenience.

Fixes: e46c2e99 ("drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Cc: Tony Ye <tony.ye@intel.com>
Cc: Andi Shyti <andi@etezian.org>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: NTony Ye <tony.ye@intel.com>
Reviewed-by: NAndi Shyti <andi@etezian.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190412071416.30097-1-chris@chris-wilson.co.uk

d1172ab3

13 4月, 2019 2 次提交

drm/panfrost: Add initial panfrost driver · f3ba9122

由 Rob Herring 提交于 9月 10, 2018

This adds the initial driver for panfrost which supports Arm Mali
Midgard and Bifrost family of GPUs. Currently, only the T860 and
T760 Midgard GPUs have been tested.

v2:
- Add GPU reset on job hangs (Tomeu)
- Add RuntimePM and devfreq support (Tomeu)
- Fix T760 support (Tomeu)
- Add a TODO file (Rob, Tomeu)
- Support multiple in fences (Tomeu)
- Drop support for shared fences (Tomeu)
- Fill in MMU de-init (Rob)
- Move register definitions back to single header (Rob)
- Clean-up hardcoded job submit todos (Rob)
- Implement feature setup based on features/issues (Rob)
- Add remaining Midgard DT compatible strings (Rob)

v3:
- Add support for reset lines (Neil)
- Add a MAINTAINERS entry (Rob)
- Call dma_set_mask_and_coherent (Rob)
- Do MMU invalidate on map and unmap. Restructure to do a single
  operation per map/unmap call. (Rob)
- Add a missing explicit padding to struct drm_panfrost_create_bo (Rob)
- Fix 0-day error: "panfrost_devfreq.c:151:9-16: ERROR: PTR_ERR applied after initialization to constant on line 150"
- Drop HW_FEATURE_AARCH64_MMU conditional (Rob)
- s/DRM_PANFROST_PARAM_GPU_ID/DRM_PANFROST_PARAM_GPU_PROD_ID/ (Rob)
- Check drm_gem_shmem_prime_import_sg_table() error code (Rob)
- Re-order power on sequence (Rob)
- Move panfrost_acquire_object_fences() before scheduling job (Rob)
- Add NULL checks on array pointers in job clean-up (Rob)
- Rework devfreq (Tomeu)
- Fix devfreq init with no regulator (Rob)
- Various WS and comments clean-up (Rob)

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Lyude Paul <lyude@redhat.com>
Reviewed-by: NAlyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: NEric Anholt <eric@anholt.net>
Reviewed-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarty E. Plummer <hanetzer@startmail.com>
Signed-off-by: NTomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
Signed-off-by: NRob Herring <robh@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190409205427.6943-4-robh@kernel.org

f3ba9122

drm/amdgpu: add timeline support in amdgpu CS v3 · 2624dd15

由 Chunming Zhou 提交于 4月 01, 2019

syncobj wait/signal operation is appending in command submission.
v2: separate to two kinds in/out_deps functions
v3: fix checking for timeline syncobj
Signed-off-by: NChunming Zhou <david1.zhou@amd.com>
Cc: Tobias Hector <Tobias.Hector@amd.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2624dd15

04 4月, 2019 1 次提交

drm/gamma: Clarify gamma lut uapi · 1fa6fa1c

由 Daniel Vetter 提交于 3月 29, 2019

Interpreting it as a 0.16 fixed point means we can't accurately
represent 1.0. Which is one of the values we really should be able to
represent.

Since most (all?) luts have lower precision this will only affect
rounding of 0xffff.

Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: "Kumar, Kiran S" <kiran.s.kumar@intel.com>
Cc: Kausal Malladi <kausalmalladi@gmail.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Rob Bradford <robert.bradford@intel.com>
Cc: Daniel Stone <daniels@collabora.com>
Cc: Stefan Schake <stschake@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: James (Qian) Wang <james.qian.wang@arm.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Mali DP Maintainers <malidp@foss.arm.com>
Cc: CK Hu <ck.hu@mediatek.com>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Yannick Fertre <yannick.fertre@st.com>
Cc: Philippe Cornu <philippe.cornu@st.com>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Vincent Abriou <vincent.abriou@st.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Boris Brezillon <bbrezillon@kernel.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NLiviu Dudau <liviu.dudau@arm.com>
Acked-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NMatt Roper <matthew.d.roper@intel.com>
Reviewed-by: NPhilippe Cornu <philippe.cornu@st.com>
Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190329092027.3430-1-daniel.vetter@ffwll.ch

1fa6fa1c

02 4月, 2019 1 次提交

drm/lima: driver for ARM Mali4xx GPUs · a1d2a633

由 Qiang Yu 提交于 3月 09, 2019

- Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
  OpenGL vertex shader processing and PP is for fragment shader
  processing. Each processor has its own MMU so prcessors work in
  virtual address space.
- There's only one GP but multiple PP (max 4 for mali 400 and 8
  for mali 450) in the same mali 4xx GPU. All PPs are grouped
  togather to handle a single fragment shader task divided by
  FB output tiled pixels. Mali 400 user space driver is
  responsible for assign target tiled pixels to each PP, but mali
  450 has a HW module called DLBU to dynamically balance each
  PP's load.
- User space driver allocate buffer object and map into GPU
  virtual address space, upload command stream and draw data with
  CPU mmap of the buffer object, then submit task to GP/PP with
  a register frame indicating where is the command stream and misc
  settings.
- There's no command stream validation/relocation due to each user
  process has its own GPU virtual address space. GP/PP's MMU switch
  virtual address space before running two tasks from different
  user process. Error or evil user space code just get MMU fault
  or GP/PP error IRQ, then the HW/SW will be recovered.
- Use GEM+shmem for MM. Currently just alloc and pin memory when
  gem object creation. GPU vm map of the buffer is also done in
  the alloc stage in kernel space. We may delay the memory
  allocation and real GPU vm map to command submission stage in the
  furture as improvement.
- Use drm_sched for GPU task schedule. Each OpenGL context should
  have a lima context object in the kernel to distinguish tasks
  from different user. drm_sched gets task from each lima context
  in a fair way.

mesa driver can be found here before upstreamed:
https://gitlab.freedesktop.org/lima/mesa

v8:
- add comments for in_sync
- fix ctx free miss mutex unlock

v7:
- remove lima_fence_ops with default value
- move fence slab create to device probe
- check pad ioctl args to be zero
- add comments for user/kernel interface

v6:
- fix comments by checkpatch.pl

v5:
- export gp/pp version to userspace
- rebase on drm-misc-next

v4:
- use get param interface to get info
- separate context create/free ioctl
- remove unused max sched task param
- update copyright time
- use xarray instead of idr
- stop using drmP.h

v3:
- fix comments from kbuild robot
- restrict supported arch to tested ones

v2:
- fix syscall argument check
- fix job finish fence leak since kernel 5.0
- use drm syncobj to replace native fence
- move buffer object GPU va map into kernel
- reserve syscall argument space for future info
- remove kernel gem modifier
- switch TTM back to GEM+shmem MM
- use time based io poll
- use whole register name
- adopt gem reservation obj integration
- use drm_timeout_abs_to_jiffies

Cc: Eric Anholt <eric@anholt.net>
Cc: Rob Herring <robh@kernel.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Dave Airlie <airlied@gmail.com>
Signed-off-by: NAndreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: NErico Nunes <nunes.erico@gmail.com>
Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
Signed-off-by: NMarek Vasut <marex@denx.de>
Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
Signed-off-by: NSimon Shields <simon@lineageos.org>
Signed-off-by: NVasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: NQiang Yu <yuq825@gmail.com>
Reviewed-by: NEric Anholt <eric@anholt.net>
Reviewed-by: NRob Herring <robh@kerrnel.org>
Signed-off-by: NEric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/291200/

a1d2a633

01 4月, 2019 4 次提交

drm/syncobj: add timeline signal ioctl for syncobj v5 · 50d1ebef

由 Chunming Zhou 提交于 4月 01, 2019

v2: individually allocate chain array, since chain node is free independently.
v3: all existing points must be already signaled before cpu perform signal operation,
    so add check condition for that.
v4: remove v3 change and add checking to prevent out-of-order
v5: unify binary and timeline
Signed-off-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Cc: Tobias Hector <Tobias.Hector@amd.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/295792/?series=58813&rev=1

50d1ebef

drm/syncobj: add transition iotcls between binary and timeline v2 · ea569910

由 Chunming Zhou 提交于 4月 01, 2019

we need to import/export timeline point.

v2: unify to one transfer ioctl
Signed-off-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/295790/?series=58813&rev=1

ea569910

drm/syncobj: add timeline payload query ioctl v6 · 27b575a9

由 Chunming Zhou 提交于 4月 01, 2019

user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface
v5: query last signaled timeline point, not last point.
v6: add unorder point check
Signed-off-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Cc: Tobias Hector <Tobias.Hector@amd.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/295784/?series=58813&rev=1

27b575a9

drm/syncobj: add support for timeline point wait v8 · 01d6c357

由 Chunming Zhou 提交于 4月 01, 2019

points array is one-to-one match with syncobjs array.
v2:
add seperate ioctl for timeline point wait, otherwise break uapi.
v3:
userspace can specify two kinds waits::
a. Wait for time point to be completed.
b. and wait for time point to become available
v4:
rebase
v5:
add comment for xxx_WAIT_AVAILABLE
v6: rebase and rework on new container
v7: drop _WAIT_COMPLETED, it is the default anyway
v8: correctly handle garbage collected fences
Signed-off-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Cc: Tobias Hector <Tobias.Hector@amd.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NLionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/295782/?series=58813&rev=1

01d6c357

27 3月, 2019 2 次提交

drm/i915: Drop new chunks of context creation ABI (for now) · 96fd2c66

由 Chris Wilson 提交于 3月 27, 2019

The intent was to expose these as part of the means to perform full
context recovery (though not the SINGLE_TIMELINE, that is for later and
just sucked as collateral damage). As that requires a couple more
patches to complete the series, roll back the earlier chunks of ABI for
an intervening PR. We keep all the internals intact and under selftests.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190327105814.14694-1-chris@chris-wilson.co.ukReviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>

96fd2c66

drm/uapi: Remove unused DRM_DISPLAY_INFO_LEN · a9282a8e

由 Ville Syrjälä 提交于 3月 26, 2019

Remove the unused DRM_DISPLAY_INFO_LEN from the uapi headers.
I presume the original plan was to expose the display name
via getconnector, but looks like that never happened. So we have
the define for the length of the string but no string anywhere.

A quick scan didn't seem to reveal userspace referencing this
so hopefully we can just nuke it.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190326173401.7329-4-ville.syrjala@linux.intel.comReviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

a9282a8e

22 3月, 2019 4 次提交

drm/i915: Allow contexts to share a single timeline across all engines · ea593dbb

由 Chris Wilson 提交于 3月 22, 2019

Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

v2: Move the specialised timeline ordering to its own function.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk

ea593dbb

drm/i915: Extend CONTEXT_CREATE to set parameters upon construction · b9171541

由 Chris Wilson 提交于 3月 22, 2019

It can be useful to have a single ioctl to create a context with all
the initial parameters instead of a series of create + setparam + setparam
ioctls. This extension to create context allows any of the parameters
to be passed in as a linked list to be applied to the newly constructed
context.

v2: Make a local copy of user setparam (Tvrtko)
v3: Use flags to detect availability of extension interface
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-3-chris@chris-wilson.co.uk

b9171541

drm/i915: Create/destroy VM (ppGTT) for use with contexts · e0695db7

由 Chris Wilson 提交于 3月 22, 2019

In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.

v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit
v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
similarly, turned out different!)

v5: Fix up test unwind for aliasing-ppgtt (snb)
v6: Tighten language for uapi struct drm_i915_gem_vm_control.
v7: Patch the context image for runtime ppgtt switching!

Testcase: igt/gem_vm_create
Testcase: igt/gem_ctx_param/vm
Testcase: igt/gem_ctx_clone/vm
Testcase: igt/gem_ctx_shared
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-2-chris@chris-wilson.co.uk

e0695db7

drm/i915: Introduce the i915_user_extension_method · 9d1305ef

由 Chris Wilson 提交于 3月 22, 2019

An idea for extending uABI inspired by Vulkan's extension chains.
Instead of expanding the data struct for each ioctl every time we need
to add a new feature, define an extension chain instead. As we add
optional interfaces to control the ioctl, we define a new extension
struct that can be linked into the ioctl data only when required by the
user. The key advantage being able to ignore large control structs for
optional interfaces/extensions, while being able to process them in a
consistent manner.

In comparison to other extensible ioctls, the key difference is the
use of a linked chain of extension structs vs an array of tagged
pointers. For example,

struct drm_amdgpu_cs_chunk {
        __u32           chunk_id;
        __u32           length_dw;
        __u64           chunk_data;
};

struct drm_amdgpu_cs_in {
        __u32           ctx_id;
        __u32           bo_list_handle;
        __u32           num_chunks;
        __u32           _pad;
        __u64           chunks;
};

allows userspace to pass in array of pointers to extension structs, but
must therefore keep constructing that array along side the command stream.
In dynamic situations like that, a linked list is preferred and does not
similar from extra cache line misses as the extension structs themselves
must still be loaded separate to the chunks array.

v2: Apply the tail call optimisation directly to nip the worry of stack
overflow in the bud.
v3: Defend against recursion.
v4: Fixup local types to match new uabi

Opens:
- do we include the result as an out-field in each chain?
struct i915_user_extension {
	__u64 next_extension;
	__u64 name;
	__s32 result;
	__u32 mbz; /* reserved for future use */
};
* Undecided, so provision some room for future expansion.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-1-chris@chris-wilson.co.uk

9d1305ef

21 3月, 2019 1 次提交

drm/fourcc: Fix conflicting Y41x definitions · ff01e697

由 Maarten Lankhorst 提交于 3月 19, 2019

There has unfortunately been a conflict with the following 3 commits:

commit e9961ab9
Author: Ayan Kumar Halder <ayan.halder@arm.com>
Date:   Fri Nov 9 17:21:12 2018 +0000
    drm: Added a new format DRM_FORMAT_XVYU2101010

commit 7ba0fee2
Author: Brian Starkey <brian.starkey@arm.com>
Date:   Fri Oct 5 10:27:00 2018 +0100

    drm/fourcc: Add AFBC yuv fourccs for Mali

and

commit 50bf5d7d
Author: Swati Sharma <swati2.sharma@intel.com>
Date:   Mon Mar 4 17:26:33 2019 +0530

    drm: Add Y2xx and Y4xx (xx:10/12/16) format definitions and fourcc

Unfortunately gcc didn't warn about the redefinitions, because the
double defines were the set to same value, and gcc apparently no longer
warns about that.

Fix this by using new XYVU for i915, without alpha, and making the
Y41x definitions match msdn, with alpha.

Fortunately we caught it early, and the conflict hasn't even landed in
drm-next yet.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Swati Sharma <swati2.sharma@intel.com>
Cc: Ayan Kumar Halder <ayan.halder@arm.com>
Cc: malidp@foss.arm.com
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190319121702.6814-1-maarten.lankhorst@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com> #irc
Acked-by: NSean Paul <sean@poorly.run>
Reviewed-by: NAyan Kumar halder <ayan.halder@arm.com>

ff01e697

20 3月, 2019 3 次提交

drm/amdgpu: add ioctl query for enabled ras features (v2) · 5cb77114

由 xinhui pan 提交于 12月 17, 2018

Add a query for userspace to check which RAS features
are enabled.

v2: squash in warning fix
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5cb77114

drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2 · ae363a21

由 xinhui pan 提交于 12月 17, 2018

Add AMDGPU_CTX_QUERY2_FLAGS_RAS_CE/UE which indicate if any error happened
between previous query and this query.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ae363a21

drm/amdgpu: export ta fw info · 9b9ca62d

由 xinhui pan 提交于 11月 20, 2018

Output the ta fw, aka xgmi/ras, via debugfs.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9b9ca62d

13 3月, 2019 3 次提交

drm/fourcc: Add 64 bpp half float formats · 88ab9c76

由 Kevin Strasser 提交于 3月 12, 2019

Add 64 bpp 16:16:16:16 half float pixel formats. Each 16 bit component is
formatted in IEEE-754 half-precision float (binary16) 1:5:10
MSb-sign:exponent:fraction form.

This patch attempts to address the feedback provided when 2 of these
formats were previosly proposed:
  https://patchwork.kernel.org/patch/10072545/

v2:
- Fixed cpp (Ville)
- Added detail pixel formatting (Ville)
- Ordered formats in header (Ville)

v5:
- .depth should be 0 for new formats (Maarten)

Cc: Tina Zhang <tina.zhang@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: NKevin Strasser <kevin.strasser@intel.com>
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: NAdam Jackson <ajax@redhat.com>
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1552437513-22648-2-git-send-email-kevin.strasser@intel.com

88ab9c76

drm: Added a new format DRM_FORMAT_XVYU2101010 · e9961ab9

由 Ayan Kumar Halder 提交于 11月 09, 2018

This new format is supported by DP550 and DP650

Changes since v3 (series):
- Added the ack
- Rebased on the latest drm-misc-next
Signed-off-by: NAyan Kumar halder <ayan.halder@arm.com>
Reviewed-by: NLiviu Dudau <liviu.dudau@arm.com>
Acked-by: NAlyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patchwork.freedesktop.org/patch/291758/?series=57895&rev=1

e9961ab9

drm/fourcc: Add AFBC yuv fourccs for Mali · 7ba0fee2

由 Brian Starkey 提交于 10月 05, 2018

As we look to enable AFBC using DRM format modifiers, we run into
problems which we've historically handled via vendor-private details
(i.e. gralloc, on Android).

AFBC (as an encoding) is fully flexible, and for example YUV data can
be encoded into 1, 2 or 3 encoded "planes", much like the linear
equivalents. Component order is also meaningful, as AFBC doesn't
necessarily care about what each "channel" of the data it encodes
contains. Therefore ABGR8888 and RGBA8888 can be encoded in AFBC with
different representations. Similarly, 'X' components may be encoded
into AFBC streams in cases where a decoder expects to decode a 4th
component.

In addition, AFBC is a licensable IP, meaning that to support the
ecosystem we need to ensure that _all_ AFBC users are able to describe
the encodings that they need. This is much better achieved by
preserving meaning in the fourcc codes when they are combined with an
AFBC modifier.

In essence, we want to use the modifier to describe the parameters of
the AFBC encode/decode, and use the fourcc code to describe the data
being encoded/decoded.

To do anything different would be to introduce redundancy - we would
need to duplicate in the modifier information which is _already_
conveyed clearly and non-ambigiously by a fourcc code.

I hope that for RGB this is non-controversial.
(BGRA8888 + MODIFIER_AFBC) is a different format from
(RGBA8888 + MODIFIER_AFBC).

Possibly more controversial is that (XBGR8888 + MODIFIER_AFBC)
is different from (BGR888 + MODIFIER_AFBC). I understand that in some
schemes it is not the case - but in AFBC it is so.

Where we run into problems is where there are not already fourcc codes
which represent the data which the AFBC encoder/decoder is processing.
To that end, we want to introduce new fourcc codes to describe the
data being encoded/decoded, in the places where none of the existing
fourcc codes are applicable.

Where we don't support an equivalent non-compressed layout, or where
no "obvious" linear layout exists, we are proposing adding fourcc
codes which have no associated linear layout - because any layout we
proposed would be completely arbitrary.

Some formats are following the naming conventions from [2].

The summary of the new formats is:
DRM_FORMAT_VUY888 - Packed 8-bit YUV 444. Y followed by U then V.
DRM_FORMAT_VUY101010 - Packed 10-bit YUV 444. Y followed by U then
V. No defined linear encoding.
DRM_FORMAT_Y210 - Packed 10-bit YUV 422. Y followed by U (then Y)
then V. 10-bit samples in 16-bit words.
DRM_FORMAT_Y410 - Packed 10-bit YUV 444, with 2-bit alpha.
DRM_FORMAT_P210 - Semi-planar 10-bit YUV 422. Y plane, followed by
interleaved U-then-V plane. 10-bit samples in
16-bit words.
DRM_FORMAT_YUV420_8BIT - Packed 8-bit YUV 420. Y followed by U then
V. No defined linear encoding
DRM_FORMAT_YUV420_10BIT - Packed 10-bit YUV 420. Y followed by U
then V. No defined linear encoding

Please also note that in the absence of AFBC, we would still need to
add Y410, Y210 and P210.

Full rationale follows:

YUV 444 8-bit, 1-plane
----------------------
The currently defined AYUV format encodes a 4th alpha component,
which makes it unsuitable for representing a 3-component YUV 444
AFBC stream.

The proposed[1] XYUV format which is supported by Mali-DP in linear
layout is also unsuitable, because the component order is the
opposite of the AFBC version, and it encodes a 4th 'X' component.

DRM_FORMAT_VUY888 is the "obvious" format for a 3-component, packed,
YUV 444 8-bit format, with the component order which our HW expects to
encode/decode. It conforms to the same naming convention as the
existing packed YUV 444 format.
The naming here is meant to be consistent with DRM_FORMAT_AYUV and
DRM_FORMAT_XYUV[1]

YUV 444 10-bit, 1-plane
-----------------------
There is no currently-defined YUV 444 10-bit format in
drm_fourcc.h, irrespective of number of planes.

The proposed[1] XVYU2101010 format which is supported by Mali-DP in
linear layout uses the wrong component order, and also encodes a 4th
'X' component, which doesn't match the AFBC version of YUV 444
10-bit which we support.

DRM_FORMAT_Y410 is the same layout as XVYU2101010, but with 2 bits of
alpha. This format is supported with linear layout by Mali GPUs. The
naming follows[2].

There is no "obvious" linear encoding for a 3-component 10:10:10
packed format, and so DRM_FORMAT_VUY101010 defines a component
order, but not a bit encoding. Again, the naming is meant to be
consistent with DRM_FORMAT_AYUV.

YUV 422 8-bit, 1-plane
----------------------
The existing DRM_FORMAT_YUYV (and the other component orders) are
single-planar YUV 422 8-bit formats. Following the convention of
the component orders of the RGB formats, YUYV has the correct
component order for our AFBC encoding (Y followed by U followed by
V). We can use YUYV for AFBC YUV 422 8-bit.

YUV 422 10-bit, 1-plane
-----------------------
There is no currently-defined YUV 422 10-bit format in drm_fourcc.h

DRM_FORMAT_Y210 is analogous to YUYV, but with 10-bits per sample
packed into the upper 10-bits of 16-bit samples. This format is
supported in both linear and AFBC by Mali GPUs.

YUV 422 10-bit, 2-plane
-----------------------
The recently defined DRM_FORMAT_P010 format is a 10-bit semi-planar
YUV 420 format, which has the correct component ordering for an AFBC
2-plane YUV 420 buffer. The linear layout contains meaningless padding
bits, which will not be encoded in an AFBC stream.

YUV 420 8-bit, 1-plane
----------------------
There is no currently defined single-planar YUV 420, 8-bit format
in drm_fourcc.h. There's differing opinions on whether using the
existing fourcc-implied n_planes where possible is a good idea or
not when using modifiers.

For me, it's much more "obvious" to use NV12 for 2-plane AFBC and
YUV420 for 3-plane AFBC. This keeps the aforementioned separation
between the AFBC codec settings (in the modifier) and the pixel data
format (in the fourcc). With different vendors using AFBC, this helps
to ensure that there is no confusion in interoperation. It also
ensures that the AFBC modifiers describe AFBC itself (which is a
licensable component), and not implementation details which are not
defined by AFBC.

The proposed[1] X0L0 format which Mali-DP supports with Linear layout
is unsuitable, as it contains a 4th 'X' component, and our AFBC
decoder expects only 3 components.

To that end, we propose a new YUV 420 8-bit format. There is no
"obvious" linear encoding for a 3-component 8:8:8, 420, packed format,
and so DRM_FORMAT_YUV420_8BIT defines a component order, but not a
bit encoding. I'm happy to hear different naming suggestions.

YUV 420 8-bit, 2-, 3-plane
--------------------------
These already exist, we can use NV12 and YUV420.

YUV 420 10-bit, 1-plane
-----------------------
As above, no current definition exists, and X0L2 encodes a 4th 'X'
channel.

Analogous to DRM_FORMAT_YUV420_8BIT, we define DRM_FORMAT_YUV420_10BIT.

[1] https://lists.freedesktop.org/archives/dri-devel/2018-July/184598.html
[2] https://docs.microsoft.com/en-us/windows/desktop/medfound/10-bit-and-16-bit-yuv-video-formats

Changes since RFC v1:
- Fix confusing subsampling vs bit-depth X:X:X notation in
descriptions (danvet)
- Rename DRM_FORMAT_AVYU1101010 to DRM_FORMAT_Y410 (Lisa Wu)
- Add drm_format_info structures for the new formats, using the
new 'bpp' field for those with non-integer bytes-per-pixel
- Rebase, including Juha-Pekka Heikkila's format definitions

Changes since RFC v2:
- Rebase on top of latest changes in drm-misc-next
- Change the description of DRM_FORMAT_P210 in __drm_format_info and
drm_fourcc.h so as to make it consistent with other DRM_FORMAT_PXXX
formats.

Changes since v3:
- Added the ack
- Rebased on the latest drm-misc-next
Signed-off-by: NBrian Starkey <brian.starkey@arm.com>
Signed-off-by: NAyan Kumar Halder <ayan.halder@arm.com>
Reviewed-by: NLiviu Dudau <liviu.dudau@arm.com>
Acked-by: NAlyssa Rosenzweig <alyssa@rosenzweig.io>
Link: https://patchwork.freedesktop.org/patch/291759/?series=57895&rev=1

7ba0fee2

06 3月, 2019 1 次提交

drm/i915: Remove last traces of exec-id (GEM_BUSY) · c8b50242

由 Chris Wilson 提交于 3月 05, 2019

As we allow per-context engine allows the legacy concept of
I915_EXEC_RING no longer applies universally. We are still exposing the
unrelated exec-id in GEM_BUSY, so transition this ioctl (once more
slightly changing its ABI, but no one cares) over to only reporting the
uabi-class (not instance as we can not foreseeably fit those into the
small bitmask).

The only user of the extended ring information from GEM_BUSY is ddx/sna,
which tries to use the non-rcs business information to guide which
engine to use for subsequent operations on foreign bo. All that matters
for it is the decision between rcs and !rcs, so it is unaffected by the
change in higher bits.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305162643.20243-1-chris@chris-wilson.co.uk

c8b50242

05 3月, 2019 1 次提交

drm: Add Y2xx and Y4xx (xx:10/12/16) format definitions and fourcc · 50bf5d7d

由 Swati Sharma 提交于 3月 04, 2019

The following pixel formats are packed format that follows 4:2:2
chroma sampling. For memory represenation each component is
allocated 16 bits each. Thus each pixel occupies 32bit.

Y210: For each component, valid data occupies MSB 10 bits.
LSB 6 bits are filled with zeroes.
Y212: For each component, valid data occupies MSB 12 bits.
LSB 4 bits are filled with zeroes.
Y216: For each component valid data occupies 16 bits,
doesn't require any padding bits.

First 16 bits stores the Y value and the next 16 bits stores one
of the chroma samples alternatively. The first luma sample will
be accompanied by first U sample and second luma sample is
accompanied by the first V sample.

The following pixel formats are packed format that follows 4:4:4
chroma sampling. Channels are arranged in the order UYVA in
increasing memory order.

Y410: Each color component occupies 10 bits and X component
takes 2 bits, thus each pixel occupies 32 bits.
Y412: Each color component is 16 bits where valid data
occupies MSB 12 bits. LSB 4 bits are filled with zeroes.
Thus, each pixel occupies 64 bits.
Y416: Each color component occupies 16 bits for valid data,
doesn't require any padding bits. Thus, each pixel
occupies 64 bits.

v3: fixed missing tab for XYUV8888 (JP)
Signed-off-by: NSwati Sharma <swati2.sharma@intel.com>
Signed-off-by: NVidya Srinivas <vidya.srinivas@intel.com>
Reviewed-by: NJuha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1551700595-21481-5-git-send-email-swati2.sharma@intel.com

50bf5d7d

02 3月, 2019 2 次提交

drm/i915: Fix I915_EXEC_RING_MASK · d90c06d5

由 Chris Wilson 提交于 3月 01, 2019

This was supposed to be a mask of all known rings, but it is being used
by execbuffer to filter out invalid rings, and so is instead mapping high
unused values onto valid rings. Instead of a mask of all known rings,
we need it to be the mask of all possible rings.

Fixes: 549f7365 ("drm/i915: Enable SandyBridge blitter ring")
Fixes: de1add36 ("drm/i915: Decouple execbuf uAPI from internal implementation")
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-21-chris@chris-wilson.co.uk

d90c06d5

drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ · e8861964

由 Chris Wilson 提交于 3月 01, 2019

Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the 3x reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement (can be up to ~10%, most see no change)
for single clients that utilize multiple engines (typically media players
and transcoders), without regressing multiple clients that can saturate
the system or changing the power envelope dramatically.

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Testcase: igt/gem_exec_whisper
Testcase: igt/benchmarks/gem_wsim
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk

e8861964

20 2月, 2019 2 次提交

drm/nouveau/svm: new ioctl to migrate process memory to GPU memory · f180bf12

由 Jérôme Glisse 提交于 8月 07, 2018

This add an ioctl to migrate a range of process address space to the
device memory. On platform without cache coherent bus (x86, ARM, ...)
this means that CPU can not access that range directly, instead CPU
will fault which will migrate the memory back to system memory.

This is behind a staging flag so that we can evolve the API.
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>

f180bf12

drm/nouveau/svm: initial support for shared virtual memory · eeaf06ac

由 Ben Skeggs 提交于 7月 05, 2018

This uses HMM to mirror a process' CPU page tables into a channel's page
tables, and keep them synchronised so that both the CPU and GPU are able
to access the same memory at the same virtual address.

While this code also supports Volta/Turing, it's only enabled for Pascal
GPUs currently due to channel recovery being unreliable right now on the
later GPUs.
Signed-off-by: NBen Skeggs <bskeggs@redhat.com>

eeaf06ac

19 2月, 2019 1 次提交

drm/i915: Include reminders about leaving no holes in uAPI enums · be03564b

由 Chris Wilson 提交于 2月 18, 2019

We don't want to pre-reserve any holes in our uAPI for that is a sign of
nefarious and hidden activity. Add a reminder about our uAPI
expectations to encourage good practice when adding new defines/enums.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190218094628.13522-1-chris@chris-wilson.co.uk

be03564b

18 2月, 2019 1 次提交

drm/i915: Optionally disable automatic recovery after a GPU reset · ba4fda62

由 Chris Wilson 提交于 2月 18, 2019

Some clients, such as mesa, may only emit minimal incremental batches
that rely on the logical context state from previous batches. They know
that recovery is impossible after a hang as their required GPU state is
lost, and that each in flight and subsequent batch will hang (resetting
the context image back to default perpetuating the problem).

To avoid getting into the state in the first place, we can allow clients
to opt out of automatic recovery and elect to ban any guilty context
following a hang. This prevents the continual stream of hangs and allows
the client to recreate their context and rebuild the state from scratch.

v2: Prefer calling it recoverable rather than unrecoverable.

References: https://lists.freedesktop.org/archives/mesa-dev/2019-February/215431.htmlSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org> # for mesa
Link: https://patchwork.freedesktop.org/patch/msgid/20190218105821.17293-1-chris@chris-wilson.co.uk

ba4fda62

16 2月, 2019 1 次提交

drm/amdgpu: Add command to override the context priority. · b5bb37ed

由 Bas Nieuwenhuizen 提交于 1月 30, 2019

Given a master fd we can then override the priority of the context
in another fd.

Using these overrides was recommended by Christian instead of trying
to submit from a master fd, and I am adding a way to override a
single context instead of the entire process so we can only upgrade
a single Vulkan queue and not effectively the entire process.

Reused the flags field as it was checked to be 0 anyways, so nothing
used it. This is source-incompatible (due to the name change), but
ABI compatible.
Signed-off-by: NBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b5bb37ed

09 2月, 2019 1 次提交

drm/fourcc: Add new P010, P016 video format · 05f8bc82

由 Randy Li 提交于 1月 10, 2019

P010 is a planar 4:2:0 YUV with interleaved UV plane, 10 bits per
channel video format.

P012 is a planar 4:2:0 YUV 12 bits per channel

P016 is a planar 4:2:0 YUV with interleaved UV plane, 16 bits per
channel video format.

V3: Added P012 and fixed cpp for P010.
V4: format definition refined per review.
V5: Format comment block for each new pixel format.
V6: reversed Cb/Cr order in comments.
v7: reversed Cb/Cr order in comments of header files, remove
the wrong part of commit message.
V8: reversed V7 changes except commit message and rebased.
v9: used the new properties to describe those format and
rebased.

Cc: Daniel Stone <daniel@fooishbar.org>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NRandy Li <ayaka@soulik.info>
Signed-off-by: NClint Taylor <clinton.a.taylor@intel.com>
Reviewed-by: NAyan Kumar Halder <ayan.halder@arm.com>
Reviewed-by: NNeil Armstrong <narmstrong@baylibre.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190109195710.28501-2-ayaka@soulik.info

05f8bc82

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功