提交 · 050091ab6e832f7df0b700f1b9b596613643ada4 · openeuler / Kernel

14 9月, 2019 40 次提交

drm/amdkfd: Query kfd device info by CHIP id instead of pci device id · 050091ab

由 Yong Zhao 提交于 9月 03, 2019

This optimizes out the pci device id usage in KFD and makes the code
more maintainable.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

050091ab

drm/amdgpu: Disable page faults while reading user wptrs · cd05c865

由 Felix Kuehling 提交于 8月 30, 2019

These wptrs must be pinned and GPU accessible when this is called
from hqd_load functions. So they should never fault. This resolves
a circular lock dependency issue involving four locks including the
DQM lock and mmap_sem.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NOak Zeng <Oak.Zeng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cd05c865

drm/amdgpu: disable stutter mode for renoir · 811bc15b

由 Aaron Liu 提交于 9月 04, 2019

With stutter mode enabled, NMI prints frequently.
Disable stutter for the moment because NMI warning storm, and will
enable it back till the issue is addressed
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

811bc15b

drm/amd/display: update renoir_ip_offset.h · 59d1ace3

由 Aaron Liu 提交于 9月 04, 2019

This patch updates MP1_BASE in renoir_ip_offset.h
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Acked-by: NRoman Li <roman.li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

59d1ace3

drm/amd/powerplay: implement sysfs for getting dpm clock · 6ab3b9e3

由 Prike Liang 提交于 9月 04, 2019

With the common interface print_clk_levels can get the following dpm clock:

-pp_dpm_dcefclk
-pp_dpm_fclk
-pp_dpm_mclk
-pp_dpm_sclk
-pp_dpm_socclk
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6ab3b9e3

drm/amdgpu: clean up load TMR sequence · 337c2007

由 John Clements 提交于 9月 04, 2019

Removed redundant goto statement
Signed-off-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

337c2007

drm/amdgpu: enable TA load support in Arcturus · 4fb60b02

由 John Clements 提交于 9月 04, 2019

Add support for loading XGMI/RAS FW
Signed-off-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4fb60b02

drm/amdgpu: change r type to int in gmc_v9_0_late_init · c5b6e585

由 Tao Zhou 提交于 9月 02, 2019

change r type from bool to int, suitable for both bool and int return
value
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c5b6e585

drm/amd/powerplay: replace smu->table_count with SMU_TABLE_COUNT in smu (v2) · 871e5e72

由 Kevin Wang 提交于 9月 03, 2019

fix bellow patch issue:
drm/amd/powerplay: introduce smu table id type to handle the smu table
for each asic
----
"This patch introduces new smu table type, it's to handle the
 different smu table
 defines for each asic with the same smu ip."

before:
use smu->table_count to represent the actual table count in smc firmware
use actual table count to check smu function parameter about smu table
after:
use logic table count "SMU_TABLE_COUNT" to check function parameter
because table id already mapped in smu driver,
and smu function will use logic table id not actual table id to check func parameter.

v2: squash in warning fix
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

871e5e72

drm/amd/amdgpu: add sw_fini interface for df_funcs · f1d59e00

由 Jack Zhang 提交于 9月 03, 2019

add sw_fini interface of df_funcs.
This interface will remove sysfs file of df_cntr_avail
function.

The old behavior only create sysfs of df_cntr_avail
in sw_init, but never remove it for lack of sw_fini
interface. With this,driver will report create
sysfs fail when it's loaded for the second time.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NJonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1d59e00

drm/amdgpu: init UMC & RSMU register base address · 9dc91342

由 Hawking Zhang 提交于 9月 03, 2019

UMC RAS feature requires access to UMC & RSMU registers
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9dc91342

drm/amdgpu/nbio: switch to amdgpu_nbio_ras_late_init helper function · 1c70d3d9

由 Hawking Zhang 提交于 9月 03, 2019

amdgpu_nbio_ras_late_init is used to init nbio specfic
ras debugfs/sysfs node and nbio specific interrupt handler.
It can be shared among nbio generations
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1c70d3d9

drm/amdgpu/mmhub: switch to amdgpu_mmhub_ras_late_init helper function · 47930de4

由 Hawking Zhang 提交于 9月 03, 2019

amdgpu_mmhub_ras_late_init is used to init mmhub specfic
ras debugfs/sysfs node and mmhub specific interrupt handler.
It can be shared among mmhub generations
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

47930de4

drm/amdgpu/sdma: switch to amdgpu_sdma_ras_late_init helper function · bfcf62c2

由 Hawking Zhang 提交于 9月 03, 2019

amdgpu_sdma_ras_late_init is used to init sdma specfic
ras debugfs/sysfs node and sdma specific interrupt handler.
It can be shared among sdma generations
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bfcf62c2

drm/amdgpu/gfx: switch to amdgpu_gfx_ras_late_init helper function · 6caeee7a

由 Hawking Zhang 提交于 9月 03, 2019

amdgpu_gfx_ras_late_init is used to init gfx specfic
ras debugfs/sysfs node and gfx specific interrupt handler.
It can be shared among gfx generations
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6caeee7a

drm/amdgpu/gmc: switch to amdgpu_gmc_ras_late_init helper function · a85eff14

由 Hawking Zhang 提交于 9月 03, 2019

amdgpu_gmc_ras_late_init is used to init gmc specfic
ras debugfs/sysfs node and gmc specific interrupt handler.
It can be shared among gmc generations.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a85eff14

drm/amdgpu: set ip specific ras interface pointer to NULL after free it · d094aea3

由 Hawking Zhang 提交于 9月 03, 2019

to prevent access to dangling pointers
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d094aea3

dmr/amdgpu: Add system auto reboot to RAS. · d5ea093e

由 Andrey Grodzovsky 提交于 8月 22, 2019

In case of RAS error allow user configure auto system
reboot through ras_ctrl.
This is also part of the temproray work around for the RAS
hang problem.

v4: Use latest kernel API for disk sync.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d5ea093e

drm/amdgpu: Avoid HW GPU reset for RAS. · 7c6e68c7

由 Andrey Grodzovsky 提交于 9月 13, 2019

Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt  will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7c6e68c7

drm/amdgpu: Fix bugs in amdgpu_device_gpu_recover in XGMI case. · 12ffa55d

由 Andrey Grodzovsky 提交于 8月 30, 2019

Issue 1:
In  XGMI case amdgpu_device_lock_adev for other devices in hive
was called to late, after access to their repsective schedulers.
So relocate the lock to the begining of accessing the other devs.

Issue 2:
Using amdgpu_device_ip_need_full_reset to switch the device list from
all devices in hive to the single 'master' device who owns this reset
call is wrong because when stopping schedulers we iterate all the devices
in hive but when restarting we will only reactivate the 'master' device.
Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS
needed we then have to stop schedulers for all devices in hive and not
only the 'master' but with amdgpu_device_ip_need_full_reset  we
already missed the opprotunity do to so. So just remove this logic and
always stop and start all schedulers for all devices in hive.

Also minor cleanup and print fix.

v4: Minor coding style fix.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

12ffa55d

drm/amdgpu: remove amdgpu_cs_try_evict · 43ce6bab

由 Christian König 提交于 8月 30, 2019

Trying to evict things from the current working set doesn't work that
well anymore because of per VM BOs.

Rely on reserving VRAM for page tables to avoid contention.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

43ce6bab

drm/amdgpu: reserve at least 4MB of VRAM for page tables v2 · 9d1b3c78

由 Christian König 提交于 8月 30, 2019

This hopefully helps reduce the contention for page tables.

v2: adjust maximum reported VRAM size as well
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9d1b3c78

drm/amdgpu: use moving fence instead of exclusive for VM updates · 629be203

由 Christian König 提交于 9月 13, 2019

Make VM updates depend on the moving fence instead of the exclusive one.

Makes it less likely to actually have a dependency.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

629be203

drm/amd/powerplay: do proper cleanups on hw_fini · faa695c7

由 Evan Quan 提交于 9月 02, 2019

These are needed for smu_reset support.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NJack Gui <Jack.Gui@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

faa695c7

drm/amd/powerplay: update cached feature enablement status V3 · c66846e0

由 Evan Quan 提交于 8月 21, 2019

Need to update in cache feature enablement status after pp_feature
settings. Another fix for the commit below:
drm/amd/powerplay: implment sysfs feature status function in smu

V2: update smu_feature_update_enable_state() and relates
V3: use bitmap_or and bitmap_andnot
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NJack Gui <Jack.Gui@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c66846e0

drm/amd/powerplay: guard manual mode prerequisite for clock level force · f78c47f6

由 Evan Quan 提交于 8月 30, 2019

Force clock level is for dpm manual mode only.
Reported-by: NCandice Li <candice.li@amd.com>
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NJack Gui <Jack.Gui@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f78c47f6

drm/amdgpu: only apply gds clearing workaround when ras is supported · 39857252

由 Hawking Zhang 提交于 8月 31, 2019

gds clearing workaround should only be applied on asics that support gfx ras
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

39857252

drm/amdgpu: fix memory leak when ras is not supported on specific ip block · 8bf2485a

由 Hawking Zhang 提交于 8月 31, 2019

free ras_if if ras is not supported
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8bf2485a

drm/amdgpu: check mmhub_funcs pointer before refering to it · 4ce71be6

由 Hawking Zhang 提交于 8月 31, 2019

mmhub callback functions are not initialized for all the ASICs
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4ce71be6

drm/amdgpu: Remove unnecessary TLB workaround (v2) · 17da41bf

由 Felix Kuehling 提交于 8月 29, 2019

This workaround is better handled in user mode in a way that doesn't
require allocating extra memory and breaking userptr BOs.

The TLB bug is a performance bug, not a functional or security bug.
Hence it is safe to remove this kernel part of the workaround to
allow a better workaround using only virtual address alignments in
user mode.

v2: Removed VI_BO_SIZE_ALIGN definition
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

17da41bf

drm/amdgpu: Use optimal mtypes and PTE bits for Arcturus · e0253d08

由 Felix Kuehling 提交于 8月 26, 2019

For compute VRAM allocations on Arturus use the new RW mtype
for non-coherent local memory, CC mtype for coherent local
memory and PTE_SNOOPED bit for invalidating non-dirty cache
lines on remote XGMI mappings.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Tested-by: NAmber Lin <Amber.Lin@amd.com>
Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e0253d08

drm/amdgpu: Determing PTE flags separately for each mapping (v3) · d0ba51b1

由 Felix Kuehling 提交于 8月 26, 2019

The same BO can be mapped with different PTE flags by different GPUs.
Therefore determine the PTE flags separately for each mapping instead
of storing them in the KFD buffer object.

Add a helper function to determine the PTE flags to be extended with
ASIC and memory-type-specific logic in subsequent commits.

v2: Split Arcturus-specific MTYPE changes into separate commit
v3: Fix return type of get_pte_flags to uint64_t
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d0ba51b1

drm/amdgpu: Support new arcturus mtype · 093e48c0

由 Oak Zeng 提交于 7月 26, 2019

Arcturus repurposed mtype WC to RW. Modify gmc functions
to support the new mtype
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

093e48c0

drm/amdgpu: Extends amdgpu vm definitions (v2) · 484deaed

由 Oak Zeng 提交于 7月 26, 2019

Add RW mtype introduced for arcturus.

v2:
* Don't add probe-invalidation bit from UAPI
* Don't add unused AMDGPU_MTYPE_ definitions
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

484deaed

drm/amdgpu: switch to amdgpu_ras_late_init for nbio v7_4 (v2) · 22e1d14f

由 Hawking Zhang 提交于 8月 29, 2019

call helper function in late init phase to handle ras init
for nbio ip block

v2: init local var r to 0 in case the function return failure
on asics that don't have ras_late_init implementation
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

22e1d14f

drm/amdgpu: add ras_late_init callback function for nbio v7_4 (v3) · 9ad1dc29

由 Hawking Zhang 提交于 8月 29, 2019

ras_late_init callback function will be used to do common ras
init in late init phase.

v2: call ras_late_fini to do cleanup when fails to enable interrupt

v3: rename sysfs/debugfs node name to pcie_bif_xxx
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9ad1dc29

drm/amdgpu: add mmhub ras_late_init callback function (v2) · dda79907

由 Hawking Zhang 提交于 8月 30, 2019

The function will be called in late init phase to do mmhub
ras init

v2: check ras_late_init function pointer before invoking the
function
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dda79907

drm/amdgpu: switch to amdgpu_ras_late_init for gmc v9 block (v2) · 2452e778

由 Hawking Zhang 提交于 8月 29, 2019

call helper function in late init phase to handle ras init
for gmc ip block

v2: call ras_late_fini to do clean up when fail to enable interrupt
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2452e778

drm/amdgpu: switch to amdgpu_ras_late_init for sdma v4 block (v2) · 7d0a31e8

由 Hawking Zhang 提交于 8月 29, 2019

call helper function in late init phase to handle ras init
for sdma ip block

v2: call ras_late_fini to do clean up when fail to enable interrupt
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7d0a31e8

drm/amdgpu: switch to amdgpu_ras_late_init for gfx v9 block (v2) · 63fa48db

由 Hawking Zhang 提交于 8月 29, 2019

call helper function in late init phase to handle ras init
for gfx ip block

v2: call ras_late_fini to do clean up when fail to enable interrupt
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

63fa48db

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功