- 16 9月, 2020 2 次提交
-
-
由 John Clements 提交于
Output RAS init status If RAS init fails, teardown RAS context Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
At this point the ASIC is already post reset by the HW/PSP so the HW not in proper state to be configured for suspension, some blocks might be even gated and so best is to avoid touching it. v2: Rename in_dpc to more meaningful name Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 8月, 2020 2 次提交
-
-
由 Tao Zhou 提交于
asd is not ready for some ASICs in early stage, and psp->asd_fw is more generic than ASIC name in the check. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NJiansong Chen <Jiansong.Chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
asd is not ready for some ASICs in early stage, and psp->asd_fw is more generic than ASIC name in the check. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NJiansong Chen <Jiansong.Chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 8月, 2020 2 次提交
-
-
由 Luben Tuikov 提交于
Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dennis Li 提交于
if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 8月, 2020 2 次提交
-
-
由 Christian König 提交于
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1a. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Wenhui Sheng 提交于
Enable the RAP TA loading path and add RAP test trigger interface. v2: fix potential mem leak issue Signed-off-by: NWenhui Sheng <Wenhui.Sheng@amd.com> Reviewed-by: NGuchun Chen <Guchun.Chen@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 8月, 2020 2 次提交
-
-
由 Liu ChengZhe 提交于
1. For Navi12, CHIP_SIENNA_CICHLID, skip tmr load operation; 2. Check pointer before release firmware. v2: use CHIP_SIENNA_CICHLID instead v3: remove local "bool ret"; fix grammer issue v4: use my name instead of "root" v5: fix grammer issue and indent issue Signed-off-by: NLiu ChengZhe <ChengZhe.Liu@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Colin Ian King 提交于
There is a spelling mistake in a DRM_ERROR error message. Fix it. This got lost in a merge, restore the fix. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> (cherry picked from commit 4afaa61d)
-
- 31 7月, 2020 1 次提交
-
-
由 Liu ChengZhe 提交于
1. For Navi12, CHIP_SIENNA_CICHLID, skip tmr load operation; 2. Check pointer before release firmware. v2: use CHIP_SIENNA_CICHLID instead v3: remove local "bool ret"; fix grammer issue v4: use my name instead of "root" v5: fix grammer issue and indent issue Signed-off-by: NLiu ChengZhe <ChengZhe.Liu@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 7月, 2020 1 次提交
-
-
由 John Clements 提交于
do not abort psp asd load sequence for sienna cichlid Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 7月, 2020 2 次提交
-
-
由 Jiansong Chen 提交于
Currently skip ASD FW loading and ih reroute per sienna_cichlid. Signed-off-by: NJiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 John Clements 提交于
resolve bug calculating fw start address within binary Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 7月, 2020 1 次提交
-
-
由 Colin Ian King 提交于
There is a spelling mistake in a DRM_ERROR error message. Fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 7月, 2020 2 次提交
-
-
由 Huang Rui 提交于
TMR is required to be destoried with GFX_CMD_ID_DESTROY_TMR while the system goes to suspend. Otherwise, PSP may return the failure state (0xFFFF007) on Gfx-2-PSP command GFX_CMD_ID_SETUP_TMR after do multiple times suspend/resume. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Huang Rui 提交于
Unload ASD function in suspend phase. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 08 7月, 2020 3 次提交
-
-
由 John Clements 提交于
add support for loading ucode with ta_firmware_header_v2_0 Reviewed-by: NBhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
TMR is required to be destoried with GFX_CMD_ID_DESTROY_TMR while the system goes to suspend. Otherwise, PSP may return the failure state (0xFFFF007) on Gfx-2-PSP command GFX_CMD_ID_SETUP_TMR after do multiple times suspend/resume. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
Unload ASD function in suspend phase. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 7月, 2020 9 次提交
-
-
由 Nirmoy Das 提交于
The release_firmware() function is NULL tolerant so we do not need to check for NULL param before calling it. Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Nirmoy Das 提交于
Used sparse(make C=1) to find these loose ends. v2: removed unwanted extra line Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Emily Deng 提交于
Guest VM issue the PSP clear_vf_fw command at 2 points: 1.On VF driver loading, after VF message PSP to setup rings, the next command is “clear_vf_fw” 2.On VF driver unload before VF message to destroy rings Signed-off-by: NEmily Deng <Emily.Deng@amd.com> Ack-by: NMonk.liu <monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Add support for loading SPL firmware. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Support for psp firmware header version v1_3 initialization and information print. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
As all four sdma firmware are same, PSP only receive one SDMA fw. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
Convert to psp defined ucode item, so that psp can recognize them. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NJack Xiao <Jack.Xiao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Skip ASD FW load for sienna_cichlid currently. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 5月, 2020 1 次提交
-
-
由 Likun Gao 提交于
Change memory training init and finit a common function, as it only have software behavior do not relay on the IP version of PSP. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 5月, 2020 1 次提交
-
-
由 Likun Gao 提交于
Only ras supportted need to set MP1 state to prepare for unload before reloading SMU FW. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 5月, 2020 3 次提交
-
-
由 Hawking Zhang 提交于
drop IP specific psp function for rlc autoload since the autoload_supported was introduced to mark ASICs that support rlc_autoload Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
TRIGGER_ERROR is common ras ta command for all the ASICs that support RAS feature. switch to common helper to avoid duplicate implementation per IP generation Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
get_hive_id/get_node_id/get_topology_info/set_topology_info are common xgmi command supported by TA for all the ASICs that support xgmi link. They should be implemented as common helper functions to avoid duplicated code per IP generation Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 5月, 2020 1 次提交
-
-
由 John Clements 提交于
Invoke sequence should abort when ras interrupt is detected before reading TA host shared memory Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 5月, 2020 1 次提交
-
-
由 John Clements 提交于
RAS TA shall notify driver with flags of error specifics Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 4月, 2020 3 次提交
-
-
由 Hawking Zhang 提交于
driver already had psp_firmware_header struture to deal with different layout of sos ucode. the sos micorcode initialization could be common one. Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
asd is unified ucode across asic. it is not necessary to keep its software structure to be ip specific one Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
The driver can't access UCODE_DATA/ADDR registers on production boards. Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-