提交 · a6a1f036c74e3d2a3a56b3140492c7c3ecb879f3 · openeuler / Kernel

29 1月, 2021 1 次提交

drm/scheduler: Job timeout handler returns status (v3) · a6a1f036

由 Luben Tuikov 提交于 1月 20, 2021

This patch does not change current behaviour.

The driver's job timeout handler now returns
status indicating back to the DRM layer whether
the device (GPU) is no longer available, such as
after it's been unplugged, or whether all is
normal, i.e. current behaviour.

All drivers which make use of the
drm_sched_backend_ops' .timedout_job() callback
have been accordingly renamed and return the
would've-been default value of
DRM_GPU_SCHED_STAT_NOMINAL to restart the task's
timeout timer--this is the old behaviour, and is
preserved by this patch.

v2: Use enum as the status of a driver's job
    timeout callback method.

v3: Return scheduler/device information, rather
    than task information.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Qiang Yu <yuq825@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Eric Anholt <eric@anholt.net>
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Acked-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/415095/

a6a1f036

19 1月, 2021 1 次提交

drm/sched: Cancel and flush all outstanding jobs before finish. · e582951b

由 Andrey Grodzovsky 提交于 1月 18, 2021

To avoid any possible use after free.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/414814/
CC: stable@vger.kernel.org
Signed-off-by: NChristian König <christian.koenig@amd.com>

e582951b

08 12月, 2020 3 次提交

drm/scheduler: Essentialize the job done callback · 71173e78

由 Luben Tuikov 提交于 12月 03, 2020

The job done callback is called from various
places, in two ways: in job done role, and
as a fence callback role.

Essentialize the callback to an atom
function to just complete the job,
and into a second function as a prototype
of fence callback which calls to complete
the job.

This is used in latter patches by the completion
code.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/405574/

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChristian König <christian.koenig@amd.com>

71173e78

gpu/drm: ring_mirror_list --> pending_list · 6efa4b46

由 Luben Tuikov 提交于 12月 03, 2020

Rename "ring_mirror_list" to "pending_list",
to describe what something is, not what it does,
how it's used, or how the hardware implements it.

This also abstracts the actual hardware
implementation, i.e. how the low-level driver
communicates with the device it drives, ring, CAM,
etc., shouldn't be exposed to DRM.

The pending_list keeps jobs submitted, which are
out of our control. Usually this means they are
pending execution status in hardware, but the
latter definition is a more general (inclusive)
definition.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/405573/

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChristian König <christian.koenig@amd.com>

6efa4b46

drm/scheduler: "node" --> "list" · 8935ff00

由 Luben Tuikov 提交于 12月 03, 2020

Rename "node" to "list" in struct drm_sched_job,
in order to make it consistent with what we see
being used throughout gpu_scheduler.h, for
instance in struct drm_sched_entity, as well as
the rest of DRM and the kernel.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/403515/

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChristian König <christian.koenig@amd.com>

8935ff00

17 11月, 2020 1 次提交

drm: fix some kernel-doc markups · e9d2871f

由 Mauro Carvalho Chehab 提交于 11月 16, 2020

Some identifiers have different names between their prototypes
and the kernel-doc markup.

Others need to be fixed, as kernel-doc markups should use this format:
identifier - description
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: NJani Nikula <jani.nikula@intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/12d4ca26f6843618200529ce5445063734d38c04.1605521731.git.mchehab+huawei@kernel.org

e9d2871f

13 11月, 2020 1 次提交

gpu: drm: scheduler: sched_main: Provide missing description for 'sched' paramter · 26b5cf49

由 Lee Jones 提交于 11月 05, 2020

Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/scheduler/sched_main.c:74: warning: Function parameter or member 'sched' not described in 'drm_sched_rq_init'

Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

26b5cf49

19 8月, 2020 1 次提交

drm/scheduler: Scheduler priority fixes (v2) · e2d732fd

由 Luben Tuikov 提交于 8月 11, 2020

Remove DRM_SCHED_PRIORITY_LOW, as it was used
in only one place.

Rename and separate by a line
DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
as it represents a (total) count of said
priorities and it is used as such in loops
throughout the code. (0-based indexing is the
the count number.)

Remove redundant word HIGH in priority names,
and rename *KERNEL* to *HIGH*, as it really
means that, high.

v2: Add back KERNEL and remove SW and HW,
    in lieu of a single HIGH between NORMAL and KERNEL.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e2d732fd

26 6月, 2020 1 次提交

drm/scheduler: improve job distribution with multiple queues · d41a39dd

由 Nirmoy Das 提交于 6月 25, 2020

This patch uses score to select a new drm scheduler for better
loadbalance between multiple drm schedulers instead of num_jobs.

Below are test results after running amdgpu_test for ~10 times.

Before this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          1463
sdma1          198
comp_1.0.1     280

After this patch:

sched_name     num of many times it got schedule
=========      ==================================
sdma0          925
sdma1          928
comp_1.0.1     177
comp_1.1.1     44
comp_1.2.1     43
comp_1.3.1     44
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/373000/Signed-off-by: NChristian König <christian.koenig@amd.com>

d41a39dd

15 6月, 2020 1 次提交

sched,drm/scheduler: Convert to sched_set_fifo*() · 7b31e940

由 Peter Zijlstra 提交于 4月 21, 2020

Because SCHED_FIFO is a broken scheduler model (see previous patches)
take away the priority field, the kernel can't possibly make an
informed decision.

In this case, use fifo_low, because it only cares about being above
SCHED_NORMAL. Effectively no change in behaviour.

Cc: alexander.deucher@amd.com
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NIngo Molnar <mingo@kernel.org>

7b31e940

15 4月, 2020 1 次提交

drm/scheduler: fix drm_sched_get_cleanup_job · 8623b525

由 Christian König 提交于 4月 11, 2020

We are racing to initialize sched->thread here, just always check the
current thread.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NKent Russell <kent.russell@amd.com>
Link: https://patchwork.freedesktop.org/patch/361303/

8623b525

26 3月, 2020 2 次提交

drm/scheduler: fix rare NULL ptr race · 3c0fdf33

由 Yintian Tao 提交于 3月 23, 2020

There is one one corner case at dma_fence_signal_locked
which will raise the NULL pointer problem just like below.
->dma_fence_signal
    ->dma_fence_signal_locked
	->test_and_set_bit
here trigger dma_fence_release happen due to the zero of fence refcount.

->dma_fence_put
    ->dma_fence_release
	->drm_sched_fence_release_scheduled
	    ->call_rcu
here make the union fled “cb_list” at finished fence
to NULL because struct rcu_head contains two pointer
which is same as struct list_head cb_list

Therefore, to hold the reference of finished fence at drm_sched_process_job
to prevent the null pointer during finished fence dma_fence_signal

[  732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  732.914815] #PF: supervisor write access in kernel mode
[  732.915731] #PF: error_code(0x0002) - not-present page
[  732.916621] PGD 0 P4D 0
[  732.917072] Oops: 0002 [#1] SMP PTI
[  732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     5.4.0-rc7 #1
[  732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[  732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100
[  732.938569] Call Trace:
[  732.939003]  <IRQ>
[  732.939364]  dma_fence_signal+0x29/0x50
[  732.940036]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[  732.940996]  drm_sched_process_job+0x34/0xa0 [gpu_sched]
[  732.941910]  dma_fence_signal_locked+0x85/0x100
[  732.942692]  dma_fence_signal+0x29/0x50
[  732.943457]  amdgpu_fence_process+0x99/0x120 [amdgpu]
[  732.944393]  sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu]

v2: hold the finished fence at drm_sched_process_job instead of
    amdgpu_fence_process
v3: resume the blank line
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3c0fdf33

drm/scheduler: fix rare NULL ptr race · 77bb2f20

由 Yintian Tao 提交于 3月 23, 2020

There is one one corner case at dma_fence_signal_locked
which will raise the NULL pointer problem just like below.
->dma_fence_signal
    ->dma_fence_signal_locked
	->test_and_set_bit
here trigger dma_fence_release happen due to the zero of fence refcount.

->dma_fence_put
    ->dma_fence_release
	->drm_sched_fence_release_scheduled
	    ->call_rcu
here make the union fled “cb_list” at finished fence
to NULL because struct rcu_head contains two pointer
which is same as struct list_head cb_list

Therefore, to hold the reference of finished fence at drm_sched_process_job
to prevent the null pointer during finished fence dma_fence_signal

[  732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  732.914815] #PF: supervisor write access in kernel mode
[  732.915731] #PF: error_code(0x0002) - not-present page
[  732.916621] PGD 0 P4D 0
[  732.917072] Oops: 0002 [#1] SMP PTI
[  732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G           OE     5.4.0-rc7 #1
[  732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[  732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100
[  732.938569] Call Trace:
[  732.939003]  <IRQ>
[  732.939364]  dma_fence_signal+0x29/0x50
[  732.940036]  drm_sched_fence_finished+0x12/0x20 [gpu_sched]
[  732.940996]  drm_sched_process_job+0x34/0xa0 [gpu_sched]
[  732.941910]  dma_fence_signal_locked+0x85/0x100
[  732.942692]  dma_fence_signal+0x29/0x50
[  732.943457]  amdgpu_fence_process+0x99/0x120 [amdgpu]
[  732.944393]  sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu]

v2: hold the finished fence at drm_sched_process_job instead of
    amdgpu_fence_process
v3: resume the blank line
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

77bb2f20

17 3月, 2020 2 次提交

drm/sched: implement and export drm_sched_pick_best · ec2edcc2

由 Nirmoy Das 提交于 3月 13, 2020

Remove drm_sched_entity_get_free_sched() and use the logic of picking
the least loaded drm scheduler from a drm scheduler list to implement
drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so
that it can be utilized by other drm drivers.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ec2edcc2

Revert "drm/scheduler: improve job distribution with multiple queues" · d164bebb

由 changzhu 提交于 3月 11, 2020

It needs to revert this patch to avoid amdgpu_test compute hang problem
on picasso.

This reverts commit 56822db1.
Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d164bebb

13 3月, 2020 2 次提交

drm/scheduler: fix inconsistent locking of job_list_lock · a7fbb630

由 Lucas Stach 提交于 1月 20, 2020

1db8c142 (drm/scheduler: Add drm_sched_suspend/resume_timeout()) made
the job_list_lock IRQ safe in as the suspend/resume calls were expected to
be called from IRQ context. This usage never materialized in upstream.
Instead amdgpu started locking the job_list_lock in an IRQ unsafe way in
amdgpu_ib_preempt_mark_partial_job() and amdgpu_ib_preempt_job_recovery(),
which leads to potential deadlock if one would actually start to call the
drm_sched_suspend/resume_timeout functions from IRQ context.

As no current user needs the locking to be IRQ safe, the local IRQ
disable/enable is pure overhead. Fix the inconsistent locking by changing
all uses of job_list_lock to use the IRQ unsafe locking primitives.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a7fbb630

drm/sched: add run job trace · c2c91828

由 Robert Beckett 提交于 3月 13, 2020

Add a new trace event to show when jobs are run on the HW.
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NRobert Beckett <bob.beckett@collabora.com>
Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c2c91828

17 1月, 2020 1 次提交

drm/scheduler: improve job distribution with multiple queues · 56822db1

由 Nirmoy Das 提交于 1月 15, 2020

This patch uses score based logic to select a new rq for better
loadbalance between multiple rq/scheds instead of num_jobs.

Below are test results after running amdgpu_test from mesa drm

Before this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          314
sdma1          32
comp_1.0.0     56
comp_1.0.1     0
comp_1.1.0     0
comp_1.1.1     0
comp_1.2.0     0
comp_1.2.1     0
comp_1.3.0     0
comp_1.3.1     0
After this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          216
sdma1          185
comp_1.0.0     39
comp_1.0.1     9
comp_1.1.0     12
comp_1.1.1     0
comp_1.2.0     12
comp_1.2.1     0
comp_1.3.0     12
comp_1.3.1     0
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

56822db1

27 11月, 2019 1 次提交

drm/scheduler: Avoid accessing freed bad job. · 135517d3

由 Andrey Grodzovsky 提交于 11月 25, 2019

Problem:
Due to a race between drm_sched_cleanup_jobs in sched thread and
drm_sched_job_timedout in timeout work there is a possiblity that
bad job was already freed while still being accessed from the
timeout thread.

Fix:
Instead of just peeking at the bad job in the mirror list
remove it from the list under lock and then put it back later when
we are garanteed no race with main sched thread is possible which
is after the thread is parked.

v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs.

v3: Rebase on top of drm-misc-next. v2 is not needed anymore as
drm_sched_get_cleanup_job already has a lock there.

v4: Fix comments to relfect latest code in drm-misc.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Tested-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/342356

135517d3

08 11月, 2019 2 次提交

drm/sched: Avoid job cleanup if sched thread is parked. · 2b6f717c

由 Andrey Grodzovsky 提交于 11月 07, 2019

When the sched thread is parked we assume ring_mirror_list is
not accessed from here.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2b6f717c

drm/sched: Use completion to wait for sched->thread idle v2. · 83a7772b

由 Andrey Grodzovsky 提交于 11月 04, 2019

Removes thread park/unpark hack from drm_sched_entity_fini and
by this fixes reactivation of scheduler thread while the thread
is supposed to be stopped.

v2: Per sched entity completion.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

83a7772b

07 11月, 2019 1 次提交

drm/sched: Fix passing zero to 'PTR_ERR' warning v2 · d7c5782a

由 Andrey Grodzovsky 提交于 10月 29, 2019

Fix a static code checker warning.

v2: Drop PTR_ERR_OR_ZERO.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d7c5782a

30 10月, 2019 1 次提交

drm/sched: Set error to s_fence if HW job submission failed. · 167bf960

由 Andrey Grodzovsky 提交于 10月 24, 2019

Problem:
When run_job fails and HW fence returned is NULL we still signal
the s_fence to avoid hangs but the user has no way of knowing if
the actual HW job was ran and finished.

Fix:
Allow .run_job implementations to return ERR_PTR in the fence pointer
returned and then set this error for s_fence->finished fence so whoever
wait on this fence can inspect the signaled fence for an error.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

167bf960

26 10月, 2019 1 次提交

drm/sched: Set error to s_fence if HW job submission failed. · e91e5f08

由 Andrey Grodzovsky 提交于 10月 24, 2019

Problem:
When run_job fails and HW fence returned is NULL we still signal
the s_fence to avoid hangs but the user has no way of knowing if
the actual HW job was ran and finished.

Fix:
Allow .run_job implementations to return ERR_PTR in the fence pointer
returned and then set this error for s_fence->finished fence so whoever
wait on this fence can inspect the signaled fence for an error.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e91e5f08

25 10月, 2019 1 次提交

drm: Don't free jobs in wait_event_interruptible() · 588b9828

由 Steven Price 提交于 10月 25, 2019

drm_sched_cleanup_jobs() attempts to free finished jobs, however because
it is called as the condition of wait_event_interruptible() it must not
sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep.

Instead let's rename drm_sched_cleanup_jobs() to
drm_sched_get_cleanup_job() and simply return a job for processing if
there is one. The caller can then call the free_job() callback outside
the wait_event_interruptible() where sleeping is possible before
re-checking and returning to sleep if necessary.
Tested-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
Fixes: 5918045c ("drm/scheduler: rework job destruction")
Signed-off-by: NSteven Price <steven.price@arm.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/337652/

588b9828

16 7月, 2019 1 次提交

drm/scheduler: drop use of drmP.h · 7c1be93c

由 Sam Ravnborg 提交于 6月 30, 2019

Drop use of the deprecated drmP.h header file.
Fix fallout.
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Acked-by: NEmil Velikov <emil.velikov@collabora.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Sharat Masetty <smasetty@codeaurora.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190630061922.7254-26-sam@ravnborg.org

7c1be93c

30 5月, 2019 1 次提交

drm/sched: Fix make htmldocs warnings. · d0f29d49

由 Andrey Grodzovsky 提交于 5月 29, 2019

Document the missing parameters.
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1559140180-6762-1-git-send-email-andrey.grodzovsky@amd.com

d0f29d49

25 5月, 2019 1 次提交

drm/sched: Fix static checker warning for potential NULL ptr · b576ff90

由 Andrey Grodzovsky 提交于 5月 22, 2019

Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1558533443-7795-1-git-send-email-andrey.grodzovsky@amd.com

b576ff90

22 5月, 2019 1 次提交

drm/scheduler: Fix job cleanup without timeout handler · 794c686e

由 Erico Nunes 提交于 5月 21, 2019

After "5918045c drm/scheduler: rework job destruction", jobs are
only deleted when the timeout handler is able to be cancelled
successfully.

In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT),
job cleanup would be skipped which may result in memory leaks.

Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in
drm_sched_cleanup_jobs.
Signed-off-by: NErico Nunes <nunes.erico@gmail.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/306025/?series=60878&rev=2

794c686e

03 5月, 2019 3 次提交

drm/scheduler: Add flag to hint the release of guilty job. · a5343b8a

由 Andrey Grodzovsky 提交于 4月 18, 2019

Problem:
Sched thread's cleanup function races against TO handler
and removes the guilty job from mirror list and we
have no way of differentiating if the job was removed from within the
TO handler or from the sched thread's clean-up function.

Fix:
Add a flag to scheduler to hint the TO handler that the guilty job needs
to be explicitly released.

v2: whitespace fix
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-5-git-send-email-andrey.grodzovsky@amd.com

a5343b8a

drm/sched: Keep s_fence->parent pointer · 290764af

由 Andrey Grodzovsky 提交于 4月 18, 2019

For later driver's reference to see if the fence is signaled.

v2: Move parent fence put to resubmit jobs.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-4-git-send-email-andrey.grodzovsky@amd.com

290764af

drm/scheduler: rework job destruction · 5918045c

由 Christian König 提交于 4月 18, 2019

We now destroy finished jobs from the worker thread to make sure that
we never destroy a job currently in timeout processing.
By this we avoid holding lock around ring mirror list in drm_sched_stop
which should solve a deadlock reported by a user.

v2: Remove unused variable.
v4: Move guilty job free into sched code.
v5:
Move sched->hw_rq_count to drm_sched_start to account for counter
decrement in drm_sched_stop even when we don't call resubmit jobs
if guily job did signal.
v6: remove unused variable

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com

5918045c

24 4月, 2019 1 次提交

drm/sched: Fix description of drm_sched_stop · f5d35632

由 Jonathan Neuschäfer 提交于 4月 20, 2019

Since commit 222b5f04 ("drm/sched: Refactor ring mirror list
handling."), drm_sched_hw_job_reset is no longer there, so let's adjust
the doc comment accordingly.
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NJonathan Neuschäfer <j.neuschaefer@gmx.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f5d35632

26 1月, 2019 2 次提交

drm/sched: Rework HW fence processing. · 3741540e

由 Andrey Grodzovsky 提交于 12月 05, 2018

Expedite job deletion from ring mirror list to the HW fence signal
callback instead from finish_work, together with waiting for all
such fences to signal in drm_sched_stop we garantee that
already signaled job will not be processed twice.
Remove the sched finish fence callback and just submit finish_work
directly from the HW fence callback.

v2: Fix comments.
v3: Attach  hw fence cb to sched_job
v5: Rebase
Suggested-by: NChristian Koenig <Christian.Koenig@amd.com>
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3741540e

drm/sched: Refactor ring mirror list handling. · 222b5f04

由 Andrey Grodzovsky 提交于 12月 04, 2018

Decauple sched threads stop and start and ring mirror
list handling from the policy of what to do about the
guilty jobs.
When stoppping the sched thread and detaching sched fences
from non signaled HW fenes wait for all signaled HW fences
to complete before rerunning the jobs.

v2: Fix resubmission of guilty job into HW after refactoring.

v4:
Full restart for all the jobs, not only from guilty ring.
Extract karma increase into standalone function.

v5:
Rework waiting for signaled jobs without relying on the job
struct itself as those might already be freed for non 'guilty'
job's schedulers.
Expose karma increase to drivers.

v6:
Use list_for_each_entry_safe_continue and drm_sched_process_job
in case fence already signaled.
Call drm_sched_increase_karma only once for amdgpu and add documentation.

v7:
Wait only for the latest job's fence.
Suggested-by: NChristian Koenig <Christian.Koenig@amd.com>
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

222b5f04

06 12月, 2018 2 次提交

drm/scheduler: Add drm_sched_suspend/resume_timeout() · 1db8c142

由 Sharat Masetty 提交于 11月 29, 2018

This patch adds two new functions to help client drivers suspend and
resume the scheduler job timeout. This can be useful in cases where the
hardware has preemption support enabled. Using this, it is possible to have
the timeout active only for the ring which is active on the ringbuffer.
This patch also makes the job_list_lock IRQ safe.
Suggested-by: NChristian Koenig <Christian.Koenig@amd.com>
Signed-off-by: NSharat Masetty <smasetty@codeaurora.org>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1db8c142

drm/scheduler: Set sched->thread to NULL on failure · 9afd0756

由 Sharat Masetty 提交于 11月 29, 2018

In cases where the scheduler instance is used as a base object of another
driver object, it's not clear if the driver can call scheduler cleanup on the
fail path. So, Set the sched->thread to NULL, so that the driver can safely
call drm_sched_fini() during cleanup.
Signed-off-by: NSharat Masetty <smasetty@codeaurora.org>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9afd0756

29 11月, 2018 1 次提交

drm/sched: revert "fix timeout handling v2" v2 · 68c12d24

由 Christian König 提交于 11月 22, 2018

This reverts commit 0efd2d2f.

It's still causing problems for V3D.

v2: keep rearming the timeout.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

68c12d24

20 11月, 2018 1 次提交

drm/scheduler: Fix bad job be re-processed in TDR · 85744e9c

由 Trigger Huang 提交于 11月 14, 2018

A bad job is the one triggered TDR(In the current amdgpu's
implementation, actually all the jobs in the current joq-queue will
be treated as bad jobs). In the recovery process, its fence
will be fake signaled and as a result, the work behind will be scheduled
to delete it from the mirror list, but if the TDR process is invoked
before the work's execution, then this bad job might be processed again
and the call dma_fence_set_error to its fence in TDR process will lead to
kernel warning trace:

[  143.033605] WARNING: CPU: 2 PID: 53 at ./include/linux/dma-fence.h:437 amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched]
kernel: [  143.033606] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 snd_hda_codec_generic crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev snd_seq_device snd_timer snd soundcore binfmt_misc input_leds mac_hid serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 8139too floppy psmouse 8139cp mii i2c_piix4 pata_acpi
[  143.033649] CPU: 2 PID: 53 Comm: kworker/2:1 Tainted: G           OE    4.15.0-20-generic #21-Ubuntu
[  143.033650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  143.033653] Workqueue: events drm_sched_job_timedout [amd_sched]
[  143.033656] RIP: 0010:amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched]
[  143.033657] RSP: 0018:ffffa9f880fe7d48 EFLAGS: 00010202
[  143.033659] RAX: 0000000000000007 RBX: ffff9b98f2b24c00 RCX: ffff9b98efef4f08
[  143.033660] RDX: ffff9b98f2b27400 RSI: ffff9b98f2b24c50 RDI: ffff9b98efef4f18
[  143.033660] RBP: ffffa9f880fe7d98 R08: 0000000000000001 R09: 00000000000002b6
[  143.033661] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b98efef3430
[  143.033662] R13: ffff9b98efef4d80 R14: ffff9b98efef4e98 R15: ffff9b98eaf91c00
[  143.033663] FS:  0000000000000000(0000) GS:ffff9b98ffd00000(0000) knlGS:0000000000000000
[  143.033664] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  143.033665] CR2: 00007fc49c96d470 CR3: 000000001400a005 CR4: 00000000003606e0
[  143.033669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  143.033669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  143.033670] Call Trace:
[  143.033744]  amdgpu_device_gpu_recover+0x144/0x820 [amdgpu]
[  143.033788]  amdgpu_job_timedout+0x9b/0xa0 [amdgpu]
[  143.033791]  drm_sched_job_timedout+0xcc/0x150 [amd_sched]
[  143.033795]  process_one_work+0x1de/0x410
[  143.033797]  worker_thread+0x32/0x410
[  143.033799]  kthread+0x121/0x140
[  143.033801]  ? process_one_work+0x410/0x410
[  143.033803]  ? kthread_create_worker_on_cpu+0x70/0x70
[  143.033806]  ret_from_fork+0x35/0x40

So just delete the bad job from mirror list directly

Changes in v3:
	- Add a helper function to delete the bad jobs from mirror list and call
		it directly *before* the job's fence is signaled

Changes in v2:
	- delete the useless list node check
	- also delete bad jobs in drm_sched_main because:
		kthread_unpark(ring->sched.thread) will be invoked very early before
		amdgpu_device_gpu_recover's return, then drm_sched_main will have
		chance to pick up a new job from the job queue. This new job will be
		added into the mirror list and processed by amdgpu_job_run, but may
		not be deleted from the mirror list on time due to the same reason.
		And finally re-processed by drm_sched_job_recovery
Signed-off-by: NTrigger Huang <Trigger.Huang@amd.com>
Reviewed-by: NChristian König <chrstian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

85744e9c

06 11月, 2018 1 次提交

drm/scheduler: Add drm_sched_job_cleanup · 26efecf9

由 Sharat Masetty 提交于 10月 29, 2018

This patch adds a new API to clean up the scheduler job resources. This
is primarliy needed in cases the job was created but was not queued to
the scheduler queue. Additionally with this change, the layer which
creates the scheduler job also gets to free up the job's resources and
this entails moving the dma_fence_put(finished_fence) to the drivers
ops free handler routines.
Signed-off-by: NSharat Masetty <smasetty@codeaurora.org>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

26efecf9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功