- 14 9月, 2019 40 次提交
-
-
由 Yong Zhao 提交于
This optimizes out the pci device id usage in KFD and makes the code more maintainable. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
These wptrs must be pinned and GPU accessible when this is called from hqd_load functions. So they should never fault. This resolves a circular lock dependency issue involving four locks including the DQM lock and mmap_sem. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Aaron Liu 提交于
With stutter mode enabled, NMI prints frequently. Disable stutter for the moment because NMI warning storm, and will enable it back till the issue is addressed Signed-off-by: NAaron Liu <aaron.liu@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Aaron Liu 提交于
This patch updates MP1_BASE in renoir_ip_offset.h Signed-off-by: NAaron Liu <aaron.liu@amd.com> Acked-by: NRoman Li <roman.li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Prike Liang 提交于
With the common interface print_clk_levels can get the following dpm clock: -pp_dpm_dcefclk -pp_dpm_fclk -pp_dpm_mclk -pp_dpm_sclk -pp_dpm_socclk Signed-off-by: NPrike Liang <Prike.Liang@amd.com> Reviewed-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NKevin Wang <kevin1.wang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 John Clements 提交于
Removed redundant goto statement Signed-off-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 John Clements 提交于
Add support for loading XGMI/RAS FW Signed-off-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
change r type from bool to int, suitable for both bool and int return value Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Kevin Wang 提交于
fix bellow patch issue: drm/amd/powerplay: introduce smu table id type to handle the smu table for each asic ---- "This patch introduces new smu table type, it's to handle the different smu table defines for each asic with the same smu ip." before: use smu->table_count to represent the actual table count in smc firmware use actual table count to check smu function parameter about smu table after: use logic table count "SMU_TABLE_COUNT" to check function parameter because table id already mapped in smu driver, and smu function will use logic table id not actual table id to check func parameter. v2: squash in warning fix Signed-off-by: NKevin Wang <kevin1.wang@amd.com> Reviewed-by: NEvan Quan <evan.quan@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Zhang 提交于
add sw_fini interface of df_funcs. This interface will remove sysfs file of df_cntr_avail function. The old behavior only create sysfs of df_cntr_avail in sw_init, but never remove it for lack of sw_fini interface. With this,driver will report create sysfs fail when it's loaded for the second time. Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com> Reviewed-by: NJonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
UMC RAS feature requires access to UMC & RSMU registers Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
amdgpu_nbio_ras_late_init is used to init nbio specfic ras debugfs/sysfs node and nbio specific interrupt handler. It can be shared among nbio generations Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
amdgpu_mmhub_ras_late_init is used to init mmhub specfic ras debugfs/sysfs node and mmhub specific interrupt handler. It can be shared among mmhub generations Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
amdgpu_sdma_ras_late_init is used to init sdma specfic ras debugfs/sysfs node and sdma specific interrupt handler. It can be shared among sdma generations Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
amdgpu_gfx_ras_late_init is used to init gfx specfic ras debugfs/sysfs node and gfx specific interrupt handler. It can be shared among gfx generations Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
amdgpu_gmc_ras_late_init is used to init gmc specfic ras debugfs/sysfs node and gmc specific interrupt handler. It can be shared among gmc generations. Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
to prevent access to dangling pointers Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
In case of RAS error allow user configure auto system reboot through ras_ctrl. This is also part of the temproray work around for the RAS hang problem. v4: Use latest kernel API for disk sync. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Problem: Under certain conditions, when some IP bocks take a RAS error, we can get into a situation where a GPU reset is not possible due to issues in RAS in SMU/PSP. Temporary fix until proper solution in PSP/SMU is ready: When uncorrectable error happens the DF will unconditionally broadcast error event packets to all its clients/slave upon receiving fatal error event and freeze all its outbound queues, err_event_athub interrupt will be triggered. In such case and we use this interrupt to issue GPU reset. THe GPU reset code is modified for such case to avoid HW reset, only stops schedulers, deatches all in progress and not yet scheduled job's fences, set error code on them and signals. Also reject any new incoming job submissions from user space. All this is done to notify the applications of the problem. v2: Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c Remove print param from amdgpu_ras_query_error_count v3: Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset for other XGMI hive memebers. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Issue 1: In XGMI case amdgpu_device_lock_adev for other devices in hive was called to late, after access to their repsective schedulers. So relocate the lock to the begining of accessing the other devs. Issue 2: Using amdgpu_device_ip_need_full_reset to switch the device list from all devices in hive to the single 'master' device who owns this reset call is wrong because when stopping schedulers we iterate all the devices in hive but when restarting we will only reactivate the 'master' device. Also, in case amdgpu_device_pre_asic_reset conlcudes that full reset IS needed we then have to stop schedulers for all devices in hive and not only the 'master' but with amdgpu_device_ip_need_full_reset we already missed the opprotunity do to so. So just remove this logic and always stop and start all schedulers for all devices in hive. Also minor cleanup and print fix. v4: Minor coding style fix. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Trying to evict things from the current working set doesn't work that well anymore because of per VM BOs. Rely on reserving VRAM for page tables to avoid contention. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChunming Zhou <david1.zhou@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
This hopefully helps reduce the contention for page tables. v2: adjust maximum reported VRAM size as well Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChunming Zhou <david1.zhou@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Make VM updates depend on the moving fence instead of the exclusive one. Makes it less likely to actually have a dependency. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChunming Zhou <david1.zhou@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
These are needed for smu_reset support. Signed-off-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NJack Gui <Jack.Gui@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
Need to update in cache feature enablement status after pp_feature settings. Another fix for the commit below: drm/amd/powerplay: implment sysfs feature status function in smu V2: update smu_feature_update_enable_state() and relates V3: use bitmap_or and bitmap_andnot Signed-off-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NJack Gui <Jack.Gui@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
Force clock level is for dpm manual mode only. Reported-by: NCandice Li <candice.li@amd.com> Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NJack Gui <Jack.Gui@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
gds clearing workaround should only be applied on asics that support gfx ras Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
free ras_if if ras is not supported Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
mmhub callback functions are not initialized for all the ASICs Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
This workaround is better handled in user mode in a way that doesn't require allocating extra memory and breaking userptr BOs. The TLB bug is a performance bug, not a functional or security bug. Hence it is safe to remove this kernel part of the workaround to allow a better workaround using only virtual address alignments in user mode. v2: Removed VI_BO_SIZE_ALIGN definition Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
For compute VRAM allocations on Arturus use the new RW mtype for non-coherent local memory, CC mtype for coherent local memory and PTE_SNOOPED bit for invalidating non-dirty cache lines on remote XGMI mappings. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Tested-by: NAmber Lin <Amber.Lin@amd.com> Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
The same BO can be mapped with different PTE flags by different GPUs. Therefore determine the PTE flags separately for each mapping instead of storing them in the KFD buffer object. Add a helper function to determine the PTE flags to be extended with ASIC and memory-type-specific logic in subsequent commits. v2: Split Arcturus-specific MTYPE changes into separate commit v3: Fix return type of get_pte_flags to uint64_t Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
Arcturus repurposed mtype WC to RW. Modify gmc functions to support the new mtype Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
Add RW mtype introduced for arcturus. v2: * Don't add probe-invalidation bit from UAPI * Don't add unused AMDGPU_MTYPE_ definitions Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
call helper function in late init phase to handle ras init for nbio ip block v2: init local var r to 0 in case the function return failure on asics that don't have ras_late_init implementation Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
ras_late_init callback function will be used to do common ras init in late init phase. v2: call ras_late_fini to do cleanup when fails to enable interrupt v3: rename sysfs/debugfs node name to pcie_bif_xxx Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
The function will be called in late init phase to do mmhub ras init v2: check ras_late_init function pointer before invoking the function Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
call helper function in late init phase to handle ras init for gmc ip block v2: call ras_late_fini to do clean up when fail to enable interrupt Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
call helper function in late init phase to handle ras init for sdma ip block v2: call ras_late_fini to do clean up when fail to enable interrupt Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
call helper function in late init phase to handle ras init for gfx ip block v2: call ras_late_fini to do clean up when fail to enable interrupt Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-