- 29 7月, 2022 2 次提交
-
-
由 Jack Xiao 提交于
mes self test rely on vm mapping, move it after drm sched re-started so that vm mapping can work during gpu reset. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-and-tested-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
This extra call trace dump comes out in every gpu reset. And it gives people a wrong impression that something went wrong. Although actually there was not. Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 7月, 2022 1 次提交
-
-
由 Andrey Grodzovsky 提交于
This is a follow-up cleanup to [1]. See bellow refcount balancing for calling amdgpu_job_submit_direct after this cleanup as far as I calculated. amdgpu_fence_emit dma_fence_init 1 dma_fence_get(fence) 2 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3 ---> amdgpu_job_submit_direct completes before fence signaled amdgpu_sa_bo_free (*sa_bo)->fence = dma_fence_get(fence) 4 amdgpu_job_free dma_fence_put 3 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 4 dma_fence_put(f); 3 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 2 amdgpu_fence_process dma_fence_put 1 amdgpu_sa_bo_remove_locked dma_fence_put 0 ---> amdgpu_job_submit_direct completes after fence signaled amdgpu_fence_process dma_fence_put 2 amdgpu_job_free dma_fence_put 1 amdgpu_vcn_enc_get_destroy_msg *fence = dma_fence_get(f) 2 dma_fence_put(f); 1 amdgpu_vcn_enc_ring_test_ib dma_fence_put(fence) 0 [1] - https://patchwork.kernel.org/project/dri-devel/cover/20220624180955.485440-1-andrey.grodzovsky@amd.com/Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 7月, 2022 1 次提交
-
-
由 Likun Gao 提交于
Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 30 6月, 2022 3 次提交
-
-
由 Alex Deucher 提交于
Fixes this issue: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:5094: warning: expecting prototype for amdgpu_device_gpu_recover_imp(). Prototype was for amdgpu_device_gpu_recover() instead Fixes: cf727044 ("drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover") Reviewed-by: NKent Russell <kent.russell@amd.com> Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Kent Russell 提交于
Change amdggpu to amdgpu and pedning to pending Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Use the correct adev variable for the drm_fb_helper in amdgpu_device_gpu_recover(). Noticed by inspection. Fixes: 087451f3 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 28 6月, 2022 2 次提交
-
-
由 Andrey Grodzovsky 提交于
Align refcount behaviour for amdgpu_job embedded HW fence with classic pointer style HW fences by increasing refcount each time emit is called so amdgpu code doesn't need to make workarounds using amdgpu_job.job_run_counter to keep the HW fence refcount balanced. Also since in the previous patch we resumed setting s_fence->parent to NULL in drm_sched_stop switch to directly checking if job->hw_fence is signaled to short circuit reset if already signed. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Tested-by: NYiqing Yao <yiqing.yao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Problem: After we start handling timed out jobs we assume there fences won't be signaled but we cannot be sure and sometimes they fire late. We need to prevent concurrent accesses to fence array from amdgpu_fence_driver_clear_job_fences during GPU reset and amdgpu_fence_process from a late EOP interrupt. Fix: Before accessing fence array in GPU disable EOP interrupt and flush all pending interrupt handlers for amdgpu device's interrupt line. v2: Switch from irq_get/put to full enable/disable_irq for amdgpu Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 6月, 2022 1 次提交
-
-
由 Alex Deucher 提交于
Use the correct adev variable for the drm_fb_helper in amdgpu_device_gpu_recover(). Noticed by inspection. Fixes: 087451f3 ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 6月, 2022 1 次提交
-
-
由 Yifan Zhang 提交于
enable_mes and enable_mes_kiq are set in both device init and MES IP init. Leave the ones in MES IP init, since it is a more accurate way to judge from GC IP version. Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NJack Xiao <Jack.Xiao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 6月, 2022 4 次提交
-
-
由 Andrey Grodzovsky 提交于
We skip rest requests if another one is already in progress. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Will be read by executors of async reset like debugfs. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 6月, 2022 1 次提交
-
-
由 Ramesh Errabolu 提交于
Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 6月, 2022 2 次提交
-
-
由 Somalapuram Amaranath 提交于
Added device coredump information: - Kernel version - Module - Time - VRAM status - Guilty process name and PID - GPU register dumps v1 -> v2: Variable name change v1 -> v2: NULL check v1 -> v2: Code alignment v1 -> v2: Adding dummy amdgpu_devcoredump_free v1 -> v2: memset reset_task_info to zero v2 -> v3: add CONFIG_DEV_COREDUMP for variables v2 -> v3: remove NULL check on amdgpu_devcoredump_read Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Somalapuram Amaranath 提交于
Allocate memory for register value and use the same values for devcoredump. v1 -> v2: Change krealloc_array() to kmalloc_array() v2 -> v3: Fix alignment Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 6月, 2022 4 次提交
-
-
由 Alex Deucher 提交于
LVDS support was implemented in DC a while ago. Just DAC support is left to do. Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Drop all of the extra cases in the default case. Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mitchell Augustin 提交于
Removed trailing whitespace from end of line in amdgpu_device.c Signed-off-by: NMitchell Augustin <kernel@mitchellaugustin.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Drop extra cases in the default case. Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 5月, 2022 2 次提交
-
-
由 Sunil Khatri 提交于
To enable TMZ feature based on IP version needs adev->ip_version populated but its empty. Move amdgpu_gmc_tmz_set to a place where ip_version is populated. Signed-off-by: NSunil Khatri <sunil.khatri@amd.com> Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Stanley.Yang 提交于
support umc/gfx/sdma ras on guest side Changed from V1: move sriov judgment in amdgpu_ras_interrupt_fatal_error_handler Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 5月, 2022 1 次提交
-
-
由 Likun Gao 提交于
Add sysfs interface to copy VBIOS. v2: squash in fix for proper vmalloc API (Alex) Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 5月, 2022 1 次提交
-
-
由 Yiqing Yao 提交于
[why] lru_list not empty warning in sw fini during repeated device bind unbind. There should be a amdgpu_fence_wait_empty() before the flush_delayed_work() call as Christian suggested. [how] Move to do flush_delayed_work for ttm bo delayed delete wq after fence_driver_hw_fini. Tested by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NYiqing Yao <yiqing.yao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 5月, 2022 4 次提交
-
-
由 Mukul Joshi 提交于
Enable KFD initialization with MES enabled. Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Acked-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
For kfd hasn't supported mes, skip kfd routines. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
Use the whole doorbell space for mes. Each queue in one process occupies one doorbell slot to ring the queue submitting. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 4月, 2022 2 次提交
-
-
由 Hawking Zhang 提交于
Some initial settings now are not available from the atom data table. The assumption that !ps[0] || !ps[1] in amdgpu_atom_asic_init is not valid. In addition, driver needs to strictly follow atomfirmware structure (asic_init_parameters) to initialize parameters used to execute asic_init function, otherwise, the execution of asic_init would fail. This shall be applicable to all soc15 adapters,but let make the transition on soc21 first. Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
This data has no dependencies, so encapsulate it all within amdgpu_discovery.c. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 4月, 2022 1 次提交
-
-
由 Bokun Zhang 提交于
- In the latest version of the header, there is a variable name change. This should not cause any backward compatibility since the variable is at the same offset in the struct. Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 4月, 2022 1 次提交
-
-
由 Evan Quan 提交于
With this, we can support more CG flags. Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 3月, 2022 1 次提交
-
-
由 Alex Deucher 提交于
If the GPU is passed through to a guest VM, use the PCI BAR for CPU FB access rather than the physical address of carve out. The physical address is not valid in a guest. v2: Fix HDP handing as suggested by Michel Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NMichel Dänzer <mdaenzer@redhat.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 3月, 2022 2 次提交
-
-
由 Philip Yang 提交于
amdgpu_detect_virtualization reads register, amdgpu_device_rreg access adev->reset_domain->sem if kernel defined CONFIG_LOCKDEP, below is the random boot hang backtrace on Vega10. It may get random NULL pointer access backtrace if amdgpu_sriov_runtime is true too. Move amdgpu_reset_create_reset_domain before calling to RREG32. BUG: kernel NULL pointer dereference, address: #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI Workqueue: events work_for_cpu_fn RIP: 0010:down_read_trylock+0x13/0xf0 Call Trace: <TASK> amdgpu_device_skip_hw_access+0x38/0x80 [amdgpu] amdgpu_device_rreg+0x1b/0x170 [amdgpu] amdgpu_detect_virtualization+0x73/0x100 [amdgpu] amdgpu_device_init.cold.60+0xbe/0x16b1 [amdgpu] ? pci_bus_read_config_word+0x43/0x70 amdgpu_driver_load_kms+0x15/0x120 [amdgpu] amdgpu_pci_probe+0x1a1/0x3a0 [amdgpu] Fixes: d0fb18b5 ("drm/amdgpu: Move reset sem into reset_domain") Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
We don't support runtime pm on APUs. They support more dynamic power savings using clock and powergating. Reviewed-by: NMario Limonciello <mario.limonciello@amd.com> Tested-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 3月, 2022 2 次提交
-
-
由 Alex Deucher 提交于
These were leftover from bring up and are no longer necessary. The information is available via the IP discovery table. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yifan Zhang 提交于
otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set is called. Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 2月, 2022 1 次提交
-
-
由 Andrey Grodzovsky 提交于
According to my investigation of the state of PCI reset recently it's not working. The reason is due to the fact the kernel PCI code rejects SBR when there are more then one PF under same bridge which we always have (at least AUDIO PF but usually more) and that because SBR will reset all the PFS and devices under the same bridge as you and you cannot assume they support SBR. Once we anble FLR support we can reenable this option as FLR is doable on single PF and doens't have this restriction. Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-