- 23 9月, 2020 2 次提交
-
-
由 Stanley.Yang 提交于
GCEA/MMHUB EA error should not result to DF freeze, this is fixed in next generation, but for some reasons the GCEA/MMHUB EA error will result to DF freeze in previous generation, diver should avoid to indicate GCEA/MMHUB EA error as hw fatal error in kernel message by read GCEA/MMHUB err status registers. Changed from V1: make query_ras_error_status function more general make read mmhub er status register more friendly Changed from V2: move ras error status query function into do_recovery workqueue Changed from V3: remove useless code from V2, print GCEA error status instance number Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Qinglang Miao 提交于
Simplify the return expression. Signed-off-by: NQinglang Miao <miaoqinglang@huawei.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 9月, 2020 1 次提交
-
-
由 Zheng Bin 提交于
Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2805:5-11: WARNING: Comparison to bool Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NZheng Bin <zhengbin13@huawei.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 8月, 2020 1 次提交
-
-
由 Dennis Li 提交于
if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 8月, 2020 1 次提交
-
-
由 Christian König 提交于
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1a. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 8月, 2020 2 次提交
-
-
由 shiwu.zhang 提交于
Update golden setting to improve performance on HPC and ML apps Signed-off-by: Nshiwu.zhang <shiwu.zhang@amd.com> Tested-by: Ngang.long <gang.long@amd.com> Reviewed-by: Nguchun.chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shiwu.zhang 提交于
Update golden setting to improve performance on HPC and ML apps Signed-off-by: Nshiwu.zhang <shiwu.zhang@amd.com> Tested-by: Ngang.long <gang.long@amd.com> Reviewed-by: Nguchun.chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 8月, 2020 1 次提交
-
-
由 Monk Liu 提交于
what: the MQD's save and restore of KCQ (kernel compute queue) cost lots of clocks during world switch which impacts a lot to multi-VF performance how: introduce a paramter to control the number of KCQ to avoid performance drop if there is no kernel compute queue needed notes: this paramter only affects gfx 8/9/10 v2: refine namings v3: choose queues for each ring to that try best to cross pipes evenly. v4: fix indentation some cleanupsin the gfx_compute_queue_acquire() v5: further fix on indentations more cleanupsin gfx_compute_queue_acquire() TODO: in the future we will let hypervisor driver to set this paramter automatically thus no need for user to configure it through modprobe in virtual machine Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 7月, 2020 1 次提交
-
-
由 Felix Kuehling 提交于
The KFD VMID assignment was hard-coded in a few places. Consolidate that in a single variable adev->vm_manager.first_kfd_vmid. The value is still assigned in gmc-ip-version-specific code. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 7月, 2020 3 次提交
-
-
由 Nirmoy Das 提交于
Used sparse(make C=1) to find these loose ends. v2: removed unwanted extra line Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 网络编码 提交于
[WHY] The memcpy() function copies n bytes from memory area src to memory area dest. So specify the firmware size in bytes. [How] Correct the calculation. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: Lei Guo <raykwok1150@163.com> Reviewed-by: NJunwei Zhang <zjunweihit@163.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Prike.Liang 提交于
Fixes: c1cf79ca ("drm/amdgpu: use IP discovery table for renoir") This nullptr issue should be specific on the Renoir series during try access the PWR_MISC_CNTL_STATUS when PWR IP not been detected by discovery table. Moreover the PWR IP not existing in Renoir series is expected therefore just avoid access PWR register in Renoir series. Signed-off-by: NPrike.Liang <Prike.Liang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 5月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
Just check for APU. Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 5月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
Add some APU flags to simplify handling of different APU variants. It's easier to understand the special cases if we use names flags rather than checking device ids and silicon revisions. v2: rebase on latest code Acked-by: NEvan Quan <evan.quan@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 5月, 2020 2 次提交
-
-
由 Marek Olšák 提交于
Compute IBs need this too. v2: split out version bump v3: squash in emit frame count fixes Signed-off-by: NMarek Olšák <marek.olsak@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Implement the .mem_sync hook defined earlier. v2: Rename functions Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 12 5月, 2020 2 次提交
-
-
由 Tom St Denis 提交于
On my raven1 system (rev c6) with VBIOS 113-RAVEN-114 GFXOFF is not stable (resulting in large block tiling noise in some applications). Disabling GFXOFF via the quirk list fixes the problems for me. Signed-off-by: NTom St Denis <tom.stdenis@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Tom St Denis 提交于
On my raven1 system (rev c6) with VBIOS 113-RAVEN-114 GFXOFF is not stable (resulting in large block tiling noise in some applications). Disabling GFXOFF via the quirk list fixes the problems for me. Signed-off-by: NTom St Denis <tom.stdenis@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 5月, 2020 1 次提交
-
-
由 Evan Quan 提交于
As this is already properly handled in amdgpu_gfx_off_ctrl(). In fact, this unnecessary cancel_delayed_work_sync may leave a small time window for race condition and is dangerous. Signed-off-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 5月, 2020 1 次提交
-
-
由 Evan Quan 提交于
As this is already properly handled in amdgpu_gfx_off_ctrl(). In fact, this unnecessary cancel_delayed_work_sync may leave a small time window for race condition and is dangerous. Signed-off-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 5月, 2020 1 次提交
-
-
由 Oak Zeng 提交于
With previous golden settings, compute task can't use reserved LDS (32K) on CU0 and CU1. On 64K LDS system, if compute work group allocate more than 32K LDS, then it can't be dispatched to CU0 and CU1 because of the reservation. This enables compute task to use reserved LDS on CU0 and CU1. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
-
- 29 4月, 2020 5 次提交
-
-
由 Huang Rui 提交于
Since commit "Move to a per-IB secure flag (TMZ)", we've been seeing hangs in GFX. We need to send FRAME CONTROL stop/start back-to-back, every time we flip the TMZ flag. That is, when we transition from TMZ to non-TMZ we have to send a stop with TMZ followed by a start with non-TMZ, and similarly for transitioning from non-TMZ into TMZ. This patch implements this, thus fixing the GFX hang. v1 -> v2: As suggested by Luben, and accept part of implemetation from this patch: - Put "secure" closed to the loop and use optimization - Change "secure" to bool again, and move "secure == -1" out of loop. v3: Small fixes/optimizations. Reported-and-Tested-by: NPierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Luben Tuikov 提交于
Move from a per-CS secure flag (TMZ) to a per-IB secure flag. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Luben Tuikov 提交于
Implement an accessor of adev->tmz.enabled. Let not code around access it as "if (adev->tmz.enabled)" as the organization may change. Instead... Recruit "bool amdgpu_is_tmz(adev)" to return exactly this Boolean value. That is, this function is now an accessor of an already initialized and set adev and adev->tmz. Add "void amdgpu_gmc_tmz_set(adev)" to check and set adev->gmc.tmz_enabled at initialization time. After which one uses "bool amdgpu_is_tmz(adev)" to query whether adev supports TMZ. Also, remove circular header file include. v2: Remove amdgpu_tmz.[ch] as requested. v3: Move TMZ into GMC. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
This patch expands the context control function to support trusted flag while we want to set command buffer in trusted mode. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
This patch expands the emit_tmz function to support trusted flag while we want to set command buffer in trusted mode. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 4月, 2020 1 次提交
-
-
由 Zheng Bin 提交于
Fixes coccicheck warning: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:2534:2-3: Unneeded semicolon Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NZheng Bin <zhengbin13@huawei.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com>
-
- 24 4月, 2020 2 次提交
-
-
由 Yintian Tao 提交于
Wait for the oldest sequence on the ring to be signaled in order to make sure there will be no command overrun. v2: fix coding stype and remove abs operation v3: remove the initialization of variable r Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yintian Tao 提交于
According to the current kiq read register method, there will be race condition when using KIQ to read register if multiple clients want to read at same time just like the expample below: 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B poll the seqno-1 5. the kiq complete these two read operation 6. client-A to read the register at the wb buffer and get REG-1 value Therefore, use amdgpu_device_wb_get() to request reg_val_offs for each kiq read register. v2: fix the error remove v3: fix the print typo v4: remove unused variables Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 4月, 2020 2 次提交
-
-
由 Christian König 提交于
In pp_one_vf mode avoid the extra overhead and read/write the registers without the KIQ. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NMonk Liu <monk.liu@amd.com> Acked-by: NYintian Tao <yintian.tao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Fix screen corruption with firefox. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 4月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
Fix screen corruption with firefox. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 14 4月, 2020 1 次提交
-
-
由 Guchun Chen 提交于
Prefix RAS message printing in GFX IP with PCI device info, which assists the debug in multiple GPU case. Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 4月, 2020 3 次提交
-
-
由 Aaron Liu 提交于
Make the fw_write_wait default case true since presumably all new gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM packet with opration=1. Signed-off-by: NAaron Liu <aaron.liu@amd.com> Tested-by: NAaron Liu <aaron.liu@amd.com> Tested-by: NYuxian Dai <Yuxian.Dai@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Nirmoy Das 提交于
Generate HW IP's sched_list in amdgpu_ring_init() instead of amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. This patch also stores sched_list for all HW IPs in one big array in struct amdgpu_device which makes amdgpu_ctx_init_entity() much more leaner. v2: fix a coding style issue do not use drm hw_ip const to populate amdgpu_ring_type enum v3: remove ctx reference and move sched array and num_sched to a struct use num_scheds to detect uninitialized scheduler list v4: use array_index_nospec for user space controlled variables fix possible checkpatch.pl warnings Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Aaron Liu 提交于
Make the fw_write_wait default case true since presumably all new gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM packet with opration=1. Signed-off-by: NAaron Liu <aaron.liu@amd.com> Tested-by: NAaron Liu <aaron.liu@amd.com> Tested-by: NYuxian Dai <Yuxian.Dai@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 02 4月, 2020 3 次提交
-
-
由 Christian König 提交于
When we stop the HW for example for GPU reset we should not stop the front-end scheduler. Otherwise we run into intermediate failures during command submission. The scheduler should only be stopped in very few cases: 1. We can't get the hardware working in ring or IB test after a GPU reset. 2. The KIQ scheduler is not used in the front-end and should be disabled during GPU reset. 3. In amdgpu_ring_fini() when the driver unloads. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NNirmoy Das <nirmoy.das@amd.com> Test-by: NDennis Li <dennis.li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tom St Denis 提交于
Clean up the smu10, smu12, and gfx9 drivers to use headers for registers instead of hardcoding in the C source files. Signed-off-by: NTom St Denis <tom.stdenis@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 xinhui pan 提交于
We have three ib pools, they are normal, VM, direct pools. Any jobs which schedule IBs without dependence on gpu scheduler should use DIRECT pool. Any jobs schedule direct VM update IBs should use VM pool. Any other jobs use NORMAL pool. v2: squash in coding style fix Signed-off-by: Nxinhui pan <xinhui.pan@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-