- 07 6月, 2022 2 次提交
-
-
由 Somalapuram Amaranath 提交于
Added device coredump information: - Kernel version - Module - Time - VRAM status - Guilty process name and PID - GPU register dumps v1 -> v2: Variable name change v1 -> v2: NULL check v1 -> v2: Code alignment v1 -> v2: Adding dummy amdgpu_devcoredump_free v1 -> v2: memset reset_task_info to zero v2 -> v3: add CONFIG_DEV_COREDUMP for variables v2 -> v3: remove NULL check on amdgpu_devcoredump_read Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Somalapuram Amaranath 提交于
Allocate memory for register value and use the same values for devcoredump. v1 -> v2: Change krealloc_array() to kmalloc_array() v2 -> v3: Fix alignment Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 6月, 2022 4 次提交
-
-
由 Alex Deucher 提交于
LVDS support was implemented in DC a while ago. Just DAC support is left to do. Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Drop all of the extra cases in the default case. Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mitchell Augustin 提交于
Removed trailing whitespace from end of line in amdgpu_device.c Signed-off-by: NMitchell Augustin <kernel@mitchellaugustin.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Drop extra cases in the default case. Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 5月, 2022 2 次提交
-
-
由 Sunil Khatri 提交于
To enable TMZ feature based on IP version needs adev->ip_version populated but its empty. Move amdgpu_gmc_tmz_set to a place where ip_version is populated. Signed-off-by: NSunil Khatri <sunil.khatri@amd.com> Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Stanley.Yang 提交于
support umc/gfx/sdma ras on guest side Changed from V1: move sriov judgment in amdgpu_ras_interrupt_fatal_error_handler Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 5月, 2022 1 次提交
-
-
由 Likun Gao 提交于
Add sysfs interface to copy VBIOS. v2: squash in fix for proper vmalloc API (Alex) Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 5月, 2022 1 次提交
-
-
由 Yiqing Yao 提交于
[why] lru_list not empty warning in sw fini during repeated device bind unbind. There should be a amdgpu_fence_wait_empty() before the flush_delayed_work() call as Christian suggested. [how] Move to do flush_delayed_work for ttm bo delayed delete wq after fence_driver_hw_fini. Tested by: Yiqing Yao <yiqing.yao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NYiqing Yao <yiqing.yao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 5月, 2022 4 次提交
-
-
由 Mukul Joshi 提交于
Enable KFD initialization with MES enabled. Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Acked-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
For kfd hasn't supported mes, skip kfd routines. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
Use the whole doorbell space for mes. Each queue in one process occupies one doorbell slot to ring the queue submitting. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 4月, 2022 2 次提交
-
-
由 Hawking Zhang 提交于
Some initial settings now are not available from the atom data table. The assumption that !ps[0] || !ps[1] in amdgpu_atom_asic_init is not valid. In addition, driver needs to strictly follow atomfirmware structure (asic_init_parameters) to initialize parameters used to execute asic_init function, otherwise, the execution of asic_init would fail. This shall be applicable to all soc15 adapters,but let make the transition on soc21 first. Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
This data has no dependencies, so encapsulate it all within amdgpu_discovery.c. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 4月, 2022 1 次提交
-
-
由 Bokun Zhang 提交于
- In the latest version of the header, there is a variable name change. This should not cause any backward compatibility since the variable is at the same offset in the struct. Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 4月, 2022 1 次提交
-
-
由 Evan Quan 提交于
With this, we can support more CG flags. Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 3月, 2022 1 次提交
-
-
由 Alex Deucher 提交于
If the GPU is passed through to a guest VM, use the PCI BAR for CPU FB access rather than the physical address of carve out. The physical address is not valid in a guest. v2: Fix HDP handing as suggested by Michel Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NMichel Dänzer <mdaenzer@redhat.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 3月, 2022 2 次提交
-
-
由 Philip Yang 提交于
amdgpu_detect_virtualization reads register, amdgpu_device_rreg access adev->reset_domain->sem if kernel defined CONFIG_LOCKDEP, below is the random boot hang backtrace on Vega10. It may get random NULL pointer access backtrace if amdgpu_sriov_runtime is true too. Move amdgpu_reset_create_reset_domain before calling to RREG32. BUG: kernel NULL pointer dereference, address: #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI Workqueue: events work_for_cpu_fn RIP: 0010:down_read_trylock+0x13/0xf0 Call Trace: <TASK> amdgpu_device_skip_hw_access+0x38/0x80 [amdgpu] amdgpu_device_rreg+0x1b/0x170 [amdgpu] amdgpu_detect_virtualization+0x73/0x100 [amdgpu] amdgpu_device_init.cold.60+0xbe/0x16b1 [amdgpu] ? pci_bus_read_config_word+0x43/0x70 amdgpu_driver_load_kms+0x15/0x120 [amdgpu] amdgpu_pci_probe+0x1a1/0x3a0 [amdgpu] Fixes: d0fb18b5 ("drm/amdgpu: Move reset sem into reset_domain") Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
We don't support runtime pm on APUs. They support more dynamic power savings using clock and powergating. Reviewed-by: NMario Limonciello <mario.limonciello@amd.com> Tested-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 3月, 2022 2 次提交
-
-
由 Alex Deucher 提交于
These were leftover from bring up and are no longer necessary. The information is available via the IP discovery table. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yifan Zhang 提交于
otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set is called. Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 2月, 2022 1 次提交
-
-
由 Andrey Grodzovsky 提交于
According to my investigation of the state of PCI reset recently it's not working. The reason is due to the fact the kernel PCI code rejects SBR when there are more then one PF under same bridge which we always have (at least AUDIO PF but usually more) and that because SBR will reset all the PFS and devices under the same bridge as you and you cannot assume they support SBR. Once we anble FLR support we can reenable this option as FLR is doable on single PF and doens't have this restriction. Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 2月, 2022 4 次提交
-
-
由 Somalapuram Amaranath 提交于
Dump the list of register values to trace event on GPU reset. Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
This test is not particularly useful now that GTT and GART are decoupled in the driver. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Now that we expose the benchmarks via debugfs, there is no longer a need for the module parameter. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
To avoid multiple runs in parallel to avoid mixing results. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 2月, 2022 1 次提交
-
-
由 Jiawei Gu 提交于
Add device pointer so scheduler's printing can use DRM_DEV_ERROR() instead, which makes life easier under multiple GPU scenario. v2: amend all calls of drm_sched_init() v3: fill dev pointer for all drm_sched_init() calls Signed-off-by: NJiawei Gu <Jiawei.Gu@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220221095705.5290-1-Jiawei.Gu@amd.com
-
- 18 2月, 2022 2 次提交
-
-
由 Mario Limonciello 提交于
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication that if one PCIe bridge with an AMD GPU connected doesn't support ASPM then none of them do. This is an invalid assumption as the PCIe core will configure ASPM for individual PCIe bridges. Create a new helper function that can be called by individual dGPUs to react to the `amdgpu_aspm` module parameter without having negative results for other dGPUs on the PCIe bus. Suggested-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
Define amdgpu_ras_late_init to call all ras blocks' .ras_late_init. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 2月, 2022 1 次提交
-
-
由 Alex Deucher 提交于
Since this is an existing asic, adjust the code to follow the same logic as previously so the driver state is consistent. No functional change intended. Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 2月, 2022 2 次提交
-
-
由 Surbhi Kakarya 提交于
This patch handles the GPU recovery failure in sriov environment by retrying the reset if the first reset fails. To determine the condition of retry, a new macro AMDGPU_RETRY_SRIOV_RESET is added which returns true if failure is due to ETIMEDOUT, EINVAL or EBUSY, otherwise return false.A new macro AMDGPU_MAX_RETRY_LIMIT is used to limit the retry to 2. It also handles the return status in Post Asic Reset by updating the return code with asic_reset_res and eventually return the return code in amdgpu_job_timedout(). Signed-off-by: NSurbhi Kakarya <surbhi.kakarya@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Rajneesh Bhardwaj 提交于
Add missing parameters to fix a kerneldoc warning Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 12 2月, 2022 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Update function name. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reported-by: Nkernel test robot <lkp@intel.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220211205500.601391-1-andrey.grodzovsky@amd.com
-
- 10 2月, 2022 5 次提交
-
-
由 Andrey Grodzovsky 提交于
Since we have a single instance of reset semaphore which we lock only once even for XGMI hive we don't need the nested locking hint anymore. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74120.html
-
由 Andrey Grodzovsky 提交于
This functions needs to be split into 2 parts where one is called only once for locking single instance of reset_domain's sem and reset flag and the other part which handles MP1 states should still be called for each device in XGMI hive. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74118.html
-
由 Andrey Grodzovsky 提交于
We should have a single instance per entrire reset domain. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
-
由 Andrey Grodzovsky 提交于
We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
-
由 Andrey Grodzovsky 提交于
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
-