提交 · 87d92e1f909ce366bd0a3426d5c5fb7bf92014c6 · openeuler / Kernel

16 10月, 2019 1 次提交

drm/amdgpu: change to query the actual EDC counter · 13ba0344

由 Dennis Li 提交于 10月 12, 2019

For the potential request in the future, change to
query the actual EDC counter.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NHawking Zhang <hawking.zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

13ba0344

03 10月, 2019 6 次提交

drm/amdgpu: remove ih_info parameter of gfx_ras_late_init · 41190cd7

由 Tao Zhou 提交于 9月 19, 2019

gfx_ras_late_init can get the info by itself
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

41190cd7

drm/amdgpu: add common gfx_ras_fini function · 3b7b7647

由 Tao Zhou 提交于 9月 12, 2019

gfx_ras_fini can be shared among all generations of gfx
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3b7b7647

drm/amdgpu: move gfx ecc functions to generic gfx file · 725253ab

由 Tao Zhou 提交于 9月 12, 2019

gfx ras ecc common functions could be reused among all gfx generations
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

725253ab

drm/amdgpu: update parameter of ras_ih_cb · f5f06e21

由 Tao Zhou 提交于 9月 12, 2019

change struct ras_err_data *err_data to void *err_data, align with
umc code and the callback's declaration in each ras block could
pay no attention to the structure type
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f5f06e21

drm/amdgpu: remove gfx9 NGG · 6de088a0

由 Marek Olšák 提交于 9月 19, 2019

Never used.
Signed-off-by: NMarek Olšák <marek.olsak@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6de088a0

drm/amdgpu: do not init mec2 jt for renoir · fec6a08a

由 Hawking Zhang 提交于 9月 18, 2019

For ASICs like renoir/arct, driver doesn't need to load mec2 jt.
when mec1 jt is loaded, mec2 jt will be loaded automatically
since the write is actaully broadcasted to both.

We need to more time to test other gfx9 asic. but for now we should
be able to draw conclusion that mec2 jt is not needed for renoir and
arct.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fec6a08a

17 9月, 2019 1 次提交

drm/amdgpu: remove program of lbpw for renoir · 28faa17e

由 Aaron Liu 提交于 9月 16, 2019

These is no LBPW on Renoir. So removing program of lbpw for renoir.
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

28faa17e

16 9月, 2019 1 次提交

drm/amdgpu: fix CPDMA hang in PRT mode for VEGA10 · ff9d0971

由 Tianci.Yin 提交于 9月 10, 2019

add and_mask since the programming logic of golden setting changed
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NTianci.Yin <tianci.yin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ff9d0971

14 9月, 2019 7 次提交

drm/amdgpu/gfx: switch to amdgpu_gfx_ras_late_init helper function · 6caeee7a

由 Hawking Zhang 提交于 9月 03, 2019

amdgpu_gfx_ras_late_init is used to init gfx specfic
ras debugfs/sysfs node and gfx specific interrupt handler.
It can be shared among gfx generations
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6caeee7a

drm/amdgpu: set ip specific ras interface pointer to NULL after free it · d094aea3

由 Hawking Zhang 提交于 9月 03, 2019

to prevent access to dangling pointers
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d094aea3

drm/amdgpu: Avoid HW GPU reset for RAS. · 7c6e68c7

由 Andrey Grodzovsky 提交于 9月 13, 2019

Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt  will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7c6e68c7

drm/amdgpu: only apply gds clearing workaround when ras is supported · 39857252

由 Hawking Zhang 提交于 8月 31, 2019

gds clearing workaround should only be applied on asics that support gfx ras
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

39857252

drm/amdgpu: fix memory leak when ras is not supported on specific ip block · 8bf2485a

由 Hawking Zhang 提交于 8月 31, 2019

free ras_if if ras is not supported
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8bf2485a

drm/amdgpu: switch to amdgpu_ras_late_init for gfx v9 block (v2) · 63fa48db

由 Hawking Zhang 提交于 8月 29, 2019

call helper function in late init phase to handle ras init
for gfx ip block

v2: call ras_late_fini to do clean up when fail to enable interrupt
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

63fa48db

drm/amdgpu: switch to new amdgpu_nbio structure · bebc0762

由 Hawking Zhang 提交于 8月 23, 2019

no functional change, just switch to new structures
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bebc0762

27 8月, 2019 1 次提交

drm/amdgpu: fix GFXOFF on Picasso and Raven2 · c072b0c2

由 Aaron Liu 提交于 8月 27, 2019

For picasso(adev->pdev->device == 0x15d8)&raven2(adev->rev_id >= 0x8),
firmware is sufficient to support gfxoff.
In commit 98f58ada, for picasso&raven2,
return directly and cause gfxoff disabled.

Fixes: 98f58ada ("drm/amdgpu/gfx9: update pg_flags after determining if gfx off is possible")
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c072b0c2

23 8月, 2019 2 次提交

drm/amdgpu: update gc/sdma goldensetting for rn · f13580a9

由 Aaron Liu 提交于 8月 07, 2019

This patch updates gc/sdma goldensetting for renoir
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f13580a9

drm/amdgpu: add set_gfx_cgpg implement (v2) · 12687955

由 Aaron Liu 提交于 7月 16, 2019

add set_gfx_cgpg implement

v2: check if using sw_smu (Alex)
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

12687955

22 8月, 2019 2 次提交

drm/amdgpu: remove duplicated include from gfx_v9_0.c · 252d2a52

由 YueHaibing 提交于 7月 10, 2019

Remove duplicated include.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

252d2a52

drm/amdgpu/gfx9: update pg_flags after determining if gfx off is possible · b05f65d7

由 Alex Deucher 提交于 8月 15, 2019

We need to set certain power gating flags after we determine
if the firmware version is sufficient to support gfxoff.
Previously we set the pg flags in early init, but we later
we might have disabled gfxoff if the firmware versions didn't
support it.  Move adding the additional pg flags after we
determine whether or not to support gfxoff.

Fixes: 00544006 ("drm/amdgpu: enable gfxoff again on raven series (v2)")
Tested-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: NTom St Denis <tom.stdenis@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>

b05f65d7

13 8月, 2019 10 次提交

drm/amdgpu: update lbpw for renoir · 40c8a329

由 Aaron Liu 提交于 7月 16, 2019

enable gfx_v9_0_init_lbpw for renoir
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

40c8a329

drm/amdgpu: enable power gating for renoir · 95f9e74c

由 Aaron Liu 提交于 7月 16, 2019

enable gfx power gating for renoir
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95f9e74c

drm/amdgpu: enable clock gating for renoir · f78e007f

由 Aaron Liu 提交于 8月 12, 2019

enable gfx&common clock gating for renoir
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f78e007f

drm/amdgpu: add gfx golden settings for renoir (v2) · 33294eb8

由 Huang Rui 提交于 6月 23, 2019

This patch adds gfx golden settings for renoir real asic.

v2: update settings (Alex)
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

33294eb8

drm/amdgpu: set rlc funcs for renoir · 6b3ad3b2

由 Aaron Liu 提交于 7月 24, 2019

add gfx_v9_0_rlc_funcs for renoir
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6b3ad3b2

drm/amdgpu: add gfx support for renoir · 1aafd447

由 Huang Rui 提交于 7月 24, 2019

Add Renoir checks to gfx9 code.
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1aafd447

drm/amdgpu: fix gfx9 soft recovery · 62cfcb9e

由 Pierre-Eric Pelloux-Prayer 提交于 8月 06, 2019

The SOC15_REG_OFFSET() macro wasn't used, making the soft recovery fail.

v2: use WREG32_SOC15 instead of WREG32 + SOC15_REG_OFFSET
Signed-off-by: NPierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

62cfcb9e

drm/amdgpu: increase CGCG gfx idle threshold for Arcturus · 15e2f43a

由 Le Ma 提交于 8月 09, 2019

Follow the hw spec, and no need to consider gfxoff on Arcturus
Signed-off-by: NLe Ma <le.ma@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

15e2f43a

drm/amdgpu: add gfx clock gating for Arcturus · f60481a9

由 Le Ma 提交于 8月 07, 2019

Add ARCTURUS case in gfx set clockgating function. No 3d clock on Arcturus.
Signed-off-by: NLe Ma <le.ma@amd.com>
Reviewed-by: NKenneth Feng <kenneth.feng@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f60481a9

drm/amdgpu: add check to avoid array bound issue · a2b45994

由 Guchun Chen 提交于 8月 08, 2019

Sub_block_index can be passed from user level, so
add one check before accessing the array first to
prevent array index out of bound problem.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2b45994

02 8月, 2019 6 次提交

drm/amdgpu: disable MEC2 JT context init for Arcturus · 8fda90e8

由 John Clements 提交于 7月 31, 2019

We don't need to handle it like other asics.
Signed-off-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8fda90e8

drm/amdgpu: removed duplicate line · c0dac3c9

由 John Clements 提交于 7月 31, 2019

Remove duplicate break.
Signed-off-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c0dac3c9

drm/amdgpu: replace AMDGPU_RAS_UE with AMDGPU_RAS_SUCCESS · bd2280da

由 Tao Zhou 提交于 8月 01, 2019

ce can also trigger interrupt, and even both ce and ue error can be
found in one ras query, distinguishing between ce and ue in interrupt
handler is uncessary.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Suggested-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bd2280da

drm/amdkfd: Extend CU mask to 8 SEs (v3) · 5145d57e

由 Jay Cornwall 提交于 7月 18, 2019

Following bitmap layout logic introduced by:
"drm/amdgpu: support get_cu_info for Arcturus".

v2: squash in fixup for gfx_v9_0.c (Alex)
v3: squash in debug print output fix
Signed-off-by: NJay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5145d57e

drm/amdgpu: support get_cu_info for Arcturus · 857b82d0

由 Le Ma 提交于 7月 08, 2019

This change is because SE/SH layout on Arcturus is 8*1, different from
4*2(or 4*1) on Vega ASICs.

Currently the cu bitmap array is 4x4 size, and besides the bitmap is used widely
across SW stack. To mostly reduce the scale of impact, we make the cu bitmap
array compatible with SE/SH layout on Arcturus. Then the store of cu bits of
each shader array for Arcturus will be like below:
    SE0,SH0 --> bitmap[0][0]
    SE1,SH0 --> bitmap[1][0]
    SE2,SH0 --> bitmap[2][0]
    SE3,SH0 --> bitmap[3][0]
    SE4,SH0 --> bitmap[0][1]
    SE5,SH0 --> bitmap[1][1]
    SE6,SH0 --> bitmap[2][1]
    SE7,SH0 --> bitmap[3][1]
Signed-off-by: NLe Ma <le.ma@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

857b82d0

drm/amdgpu: cleanup vega10 SRIOV code path · 4cd4c5c0

由 Monk Liu 提交于 7月 30, 2019

we can simplify all those unnecessary function under
SRIOV for vega10 since:
1) PSP L1 policy is by force enabled in SRIOV
2) original logic always set all flags which make itself
   a dummy step

besides,
1) the ih_doorbell_range set should also be skipped
for VEGA10 SRIOV.
2) the gfx_common registers should also be skipped
for VEGA10 SRIOV.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4cd4c5c0

01 8月, 2019 3 次提交

drm/amdgpu: disable inject for failed subblocks of gfx · dc4d716d

由 Dennis Li 提交于 7月 23, 2019

some subblocks of gfx fail in inject test, disable them
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dc4d716d

drm/amdgpu: support gfx ras error injection and err_cnt query · 83b0582c

由 Dennis Li 提交于 7月 31, 2019

check gfx error count in both ras querry function and
ras interrupt handler.

gfx ras is still disabled by default due to known stability
issue found in gpu reset.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

83b0582c

drm/amdgpu: add RAS callback for gfx · 2c960ea0

由 Dennis Li 提交于 7月 31, 2019

Add functions for RAS error inject and query error counter
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2c960ea0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功