提交 · cc063ea2ec7cc091639e6c95eb93e97d6e2ed6e3 · openeuler / Kernel

11 7月, 2020 1 次提交

drm/amdgpu: don't do soft recovery if gpu_recovery=0 · cc063ea2

由 Marek Olšák 提交于 7月 06, 2020

It's impossible to debug shader hangs with soft recovery.
Signed-off-by: NMarek Olšák <marek.olsak@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cc063ea2

01 7月, 2020 1 次提交

drm/amdgpu: remove distinction between explicit and implicit sync (v2) · 174b328b

由 Christian König 提交于 5月 27, 2020

According to Marek a pipeline sync should be inserted for implicit syncs well.

v2: bump the driver version
Signed-off-by: NChristian König <christian.koenig@amd.com>
Tested-by: NMarek Olšák <marek.olsak@amd.com>
Signed-off-by: NMarek Olšák <marek.olsak@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

174b328b

24 4月, 2020 1 次提交

drm/amdgpu: remove set but not used variable 'priority' · 00aba6da

由 YueHaibing 提交于 4月 21, 2020

drivers/gpu/drm/amd/amdgpu/amdgpu_job.c: In function amdgpu_job_submit:
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:148:26: warning: variable priority set but not used [-Wunused-but-set-variable]

commit 33abcb1f ("drm/amdgpu: set compute queue priority at mqd_init")
left behind this, remove it.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

00aba6da

14 4月, 2020 1 次提交

drm/amdgpu: restrict debugfs register access under SR-IOV · 95a2f917

由 Yintian Tao 提交于 4月 07, 2020

Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
    amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
    and directly return result
Signed-off-by: NYintian Tao <yttao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95a2f917

02 4月, 2020 1 次提交

drm/amdgpu: implement more ib pools (v2) · c8e42d57

由 xinhui pan 提交于 3月 26, 2020

We have three ib pools, they are normal, VM, direct pools.

Any jobs which schedule IBs without dependence on gpu scheduler should
use DIRECT pool.

Any jobs schedule direct VM update IBs should use VM pool.

Any other jobs use NORMAL pool.

v2: squash in coding style fix
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c8e42d57

10 3月, 2020 1 次提交

drm/amdgpu: set compute queue priority at mqd_init · 33abcb1f

由 Nirmoy Das 提交于 2月 27, 2020

We were changing compute ring priority while rings were being used
before every job submission which is not recommended. This patch
sets compute queue priority at mqd initialization for gfx8, gfx9 and
gfx10.

Policy: make queue 0 of each pipe as high priority compute queue

High/normal priority compute sched lists are generated from set of high/normal
priority compute queues. At context creation, entity of compute queue
get a sched list from high or normal priority depending on ctx->priority
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

33abcb1f

17 1月, 2020 1 次提交

drm/amdgpu: drop amdgpu_job.owner · 971fe555

由 Christian König 提交于 12月 16, 2019

Entirely unused.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

971fe555

10 12月, 2019 1 次提交

drm/amdgpu: explicitely sync to VM updates v2 · e095fc17

由 Christian König 提交于 11月 29, 2019

Allows us to reduce the overhead while syncing to fences a bit.

v2: also drop adev parameter from the functions
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e095fc17

30 10月, 2019 1 次提交

drm/amdgpu: If amdgpu_ib_schedule fails return back the error. · 57c0f58e

由 Andrey Grodzovsky 提交于 10月 24, 2019

Use ERR_PTR to return back the error happened during amdgpu_ib_schedule.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

57c0f58e

26 10月, 2019 1 次提交

drm/amdgpu: If amdgpu_ib_schedule fails return back the error. · db5e65fc

由 Andrey Grodzovsky 提交于 10月 24, 2019

db5e65fc

14 9月, 2019 1 次提交

drm/amdgpu: Avoid HW GPU reset for RAS. · 7c6e68c7

由 Andrey Grodzovsky 提交于 9月 13, 2019

Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt  will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7c6e68c7

11 6月, 2019 1 次提交

drm/amd: drop use of drmP.h in amdgpu/amdgpu* · fdf2f6c5

由 Sam Ravnborg 提交于 6月 10, 2019

Drop use of drmP.h in all files named amdgpu*
in drm/amd/amdgpu/

Fix fallout.
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org

fdf2f6c5

25 5月, 2019 1 次提交

drm/amdgpu: suppress repeating tmo report · c3b6c607

由 Monk Liu 提交于 5月 13, 2019

only report once per TMO job and the timer would
be restarted upon the job finished if it's just slow.
Suggested-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c3b6c607

21 12月, 2018 1 次提交

drm/amdgpu: print process info when job timeout · 0346bfd9

由 Trigger Huang 提交于 12月 18, 2018

When a job is timeout, try to print the related process information
for debugging
Signed-off-by: NTrigger Huang <Trigger.Huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0346bfd9

06 11月, 2018 1 次提交

drm/scheduler: Add drm_sched_job_cleanup · 26efecf9

由 Sharat Masetty 提交于 10月 29, 2018

This patch adds a new API to clean up the scheduler job resources. This
is primarliy needed in cases the job was created but was not queued to
the scheduler queue. Additionally with this change, the layer which
creates the scheduler job also gets to free up the job's resources and
this entails moving the dma_fence_put(finished_fence) to the drivers
ops free handler routines.
Signed-off-by: NSharat Masetty <smasetty@codeaurora.org>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

26efecf9

12 9月, 2018 1 次提交

drm/amdgpu: Fix SDMA TO after GPU reset v3 · d8de8260

由 Andrey Grodzovsky 提交于 9月 10, 2018

After GPU reset amdgpu_vm_clear_bo triggers VM flush
but job->vm_pd_addr is not set causing SDMA TO.

v2:
Per advise by Christian König avoid flushing VM for jobs where
job->vm_pd_addr wasn't explicitly set.

v3:
Shortcut vm_flush_needed early.

Fixes cbd52851 drm/amdgpu: move setting the GART addr into TTM.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d8de8260

28 8月, 2018 3 次提交

drm/amdgpu: add ring soft recovery v4 · 7876fa4f

由 Christian König 提交于 8月 21, 2018

Instead of hammering hard on the GPU try a soft recovery first.

v2: reorder code a bit
v3: increase timeout to 10ms, increment GPU reset counter
v4: squash in compile fix (Christian)
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>

7876fa4f

drm/amdgpu: move setting the GART addr into TTM · cbd52851

由 Christian König 提交于 8月 21, 2018

Move setting the GART addr for window based copies into the TTM code who
uses it.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cbd52851

drm/amdgpu: cleanup GPU recovery check a bit (v2) · 12938fad

由 Christian König 提交于 8月 21, 2018

Check if we should call the function instead of providing the forced
flag.

v2: rebase on KFD changes (Alex)
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

12938fad

26 7月, 2018 2 次提交

drm/scheduler: remove sched field from the entity · 068c3304

由 Nayan Deshmukh 提交于 7月 20, 2018

The scheduler of the entity is decided by the run queue on which
it is queued. This patch avoids us the effort required to maintain
a sync between rq and sched field when we start shifting entites
among different rqs.
Signed-off-by: NNayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

068c3304

drm/scheduler: modify API to avoid redundancy · cdc50176

由 Nayan Deshmukh 提交于 7月 20, 2018

entity has a scheduler field and we don't need the sched argument
in any of the functions where entity is provided.
Signed-off-by: NNayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NEric Anholt <eric@anholt.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cdc50176

19 7月, 2018 1 次提交

drm/amdgpu: change ring priority after pushing the job (v2) · b5286801

由 Christian König 提交于 7月 16, 2018

Pushing a job can change the ring assignment of an entity.

v2: squash in:
"drm/amdgpu: fix job priority handling" (Christian)
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b5286801

18 7月, 2018 2 次提交

drm/amdgpu: minor cleanup in amdgpu_job.c · f024e883

由 Christian König 提交于 7月 13, 2018

Remove superflous NULL check, fix coding style a bit, shorten error
messages.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f024e883

drm/amdgpu: remove job->adev (v2) · a1917b73

由 Christian König 提交于 7月 13, 2018

We can get that from the ring.

v2: squash in "drm/amdgpu: always initialize job->base.sched" (Alex)
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a1917b73

17 7月, 2018 4 次提交

drm/amdgpu: add amdgpu_job_submit_direct helper · ee913fd9

由 Christian König 提交于 7月 13, 2018

Make sure that we properly initialize at least the sched member.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ee913fd9

drm/amdgpu: remove job->ring · 3320b8d2

由 Christian König 提交于 7月 13, 2018

We can easily get that from the scheduler.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3320b8d2

drm/amdgpu: remove ring parameter from amdgpu_job_submit · 0e28b10f

由 Christian König 提交于 7月 13, 2018

We know the ring through the entity anyway.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0e28b10f

drm/amdgpu: remove fence context from the job · eb3961a5

由 Christian König 提交于 7月 13, 2018

Can be obtained directly from the fence as well.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

eb3961a5

28 12月, 2017 2 次提交

drm/amdgpu: rename vm_id to vmid · c4f46f22

由 Christian König 提交于 12月 18, 2017

sed -i "s/vm_id/vmid/g" drivers/gpu/drm/amd/amdgpu/*.c
sed -i "s/vm_id/vmid/g" drivers/gpu/drm/amd/amdgpu/*.h
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c4f46f22

drm/amdgpu: separate VMID and PASID handling · 620f774f

由 Christian König 提交于 12月 18, 2017

Move both into the new files amdgpu_ids.[ch]. No functional change.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

620f774f

18 12月, 2017 1 次提交

drm/amdgpu: rename amdgpu_gpu_recover · 5f152b5e

由 Alex Deucher 提交于 12月 15, 2017

add device to the name for consistency.
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5f152b5e

16 12月, 2017 1 次提交

drm/amdgpu: Add gpu_recovery parameter · dcebf026

由 Andrey Grodzovsky 提交于 12月 12, 2017

Add new parameter to control GPU recovery procedure.

v2:
Add auto logic where reset is disabled for bare metal and enabled
for SR-IOV.
Allow forced reset from debugfs.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dcebf026

08 12月, 2017 1 次提交

drm: move amd_gpu_scheduler into common location · 1b1f42d8

由 Lucas Stach 提交于 12月 06, 2017

This moves and renames the AMDGPU scheduler to a common location in DRM
in order to facilitate re-use by other drivers. This is mostly a straight
forward rename with no code changes.

One notable exception is the function to_drm_sched_fence(), which is no
longer a inline header function to avoid the need to export the
drm_sched_fence_ops_scheduled and drm_sched_fence_ops_finished structures.
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Tested-by: NDieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NLucas Stach <l.stach@pengutronix.de>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1b1f42d8

07 12月, 2017 1 次提交

drm/amdgpu: Get rid of dep_sync as a seperate object. · cebb52b7

由 Andrey Grodzovsky 提交于 11月 13, 2017

Instead mark fence as explicit in it's amdgpu_sync_entry.

v2:
Fix use after free bug and add new parameter description.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cebb52b7

05 12月, 2017 4 次提交

drm/amdgpu:implement new GPU recover(v3) · 5740682e

由 Monk Liu 提交于 10月 25, 2017

1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset

2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently

3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit

V2:

4,in scheduler main routine the job from guilty context  will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error

5,in scheduler recovery routine all jobs from the guilty entity would be
dropped

6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.

V3:

7,replace deprecated gpu reset, use new gpu recover
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5740682e

amd/scheduler:imple job skip feature(v3) · 48f05f29

由 Monk Liu 提交于 10月 25, 2017

jobs are skipped under two cases
1)when the entity behind this job marked guilty, the job
poped from this entity's queue will be dropped in sched_main loop.

2)in job_recovery(), skip the scheduling job if its karma detected
above limit, and also skipped as well for other jobs sharing the
same fence context. this approach is becuase job_recovery() cannot
access job->entity due to entity may already dead.

v2:
some logic fix

v3:
when entity detected guilty, don't drop the job in the poping
stage, instead set its fence error as -ECANCELED

in run_job(), skip the scheduling either:1) fence->error < 0
or 2) there was a VRAM LOST occurred on this job.
this way we can unify the job skipping logic.

with this feature we can introduce new gpu recover feature.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

48f05f29

drm/amdgpu: Remove job->s_entity to avoid keeping reference to stale pointer. · a4176cb4

由 Andrey Grodzovsky 提交于 10月 24, 2017

Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4176cb4

drm/amdgpu: Avoid accessing job->entity after the job is scheduled. · d1f6dc1a

由 Andrey Grodzovsky 提交于 10月 19, 2017

Bug: amdgpu_job_free_cb was accessing s_job->s_entity when the allocated
amdgpu_ctx (and the entity inside it) were already deallocated from
amdgpu_cs_parser_fini.

Fix: Save job's priority on it's creation instead of accessing it from
s_entity later on.
Signed-off-by: NAndrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d1f6dc1a

20 10月, 2017 2 次提交

drm/amdgpu:fix duplicated setting job's vram_lost · c70b78a7

由 Monk Liu 提交于 10月 16, 2017

Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c70b78a7

drm/amdgpu: set -ECANCELED when dropping jobs · 7a0a48dd

由 Christian König 提交于 10月 09, 2017

And return from the wait functions the fence error code.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7a0a48dd

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功