- 19 10月, 2022 1 次提交
-
-
由 Victor Zhao 提交于
This reverts commit 5bd8d53f. This commit breaks the reset logic for aldebaran, revert it for now. Will move the mask inside the reset handler. Fixes: 5bd8d53f ("drm/amdgpu: add debugfs amdgpu_reset_level") Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 21 9月, 2022 1 次提交
-
-
由 Christian König 提交于
Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 8月, 2022 3 次提交
-
-
由 Victor Zhao 提交于
In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset v2: add a hang flag to indicate the reset comes from a job timeout, skip ring test and cp halt wait in this case v3: move hang flag to adev Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com> Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Victor Zhao 提交于
Introduce amdgpu_reset_level debugfs in order to help debug and test specific type of reset. Also helps blocking unwanted type of resets. By default, mode2 reset will not be enabled v2: make this debugfs in adev and use debugfs_create_u32 Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com> Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dusica Milinkovic 提交于
[Why] During multi-vf executing benchmark (Luxmark) observed kiq error timeout. It happenes because all of VFs do the tlb invalidation at the same time. Although each VF has the invalidate register set, from hardware side the invalidate requests are queue to execute. [How] In case of 12 VF increase timeout on 12*100ms Signed-off-by: NDusica Milinkovic <Dusica.Milinkovic@amd.com> Acked-by: NShaoyun Liu <shaoyun.liu@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 7月, 2022 1 次提交
-
-
由 Roy Sun 提交于
The comments say that the product number is a 16-digit HEX string so the buffer needs to be at least 17 characters to hold the NUL terminator. Expand the buffer size to 20 to avoid the alignment issues. The comment:Product number should only be 16 characters. Any more,and something could be wrong. Cap it at 16 to be safe Signed-off-by: NRoy Sun <Roy.Sun@amd.com> Reviewed-by: NAndré Almeida <andrealmeid@igalia.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 7月, 2022 1 次提交
-
-
由 Leo Li 提交于
[Why] Being able to configure visual confirm at boot or in cmdline is helpful when debugging. [How] Add a module parameter to configure DC visual confirm, which works the same way as the equivalent debugfs entry. Signed-off-by: NLeo Li <sunpeng.li@amd.com> Reviewed-by: NRodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 7月, 2022 1 次提交
-
-
由 Guchun Chen 提交于
It's redundant, as now switching to rpm_mode to indicate runtime power management mode. Suggested-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 7月, 2022 1 次提交
-
-
由 Likun Gao 提交于
Move reset_context out of gpu recover function to make it configurable for different reset purpose. For the reset way of call gpu_recovery sysfs, force to use full reset method. Otherwise, try soft reset by default if the related ASIC supportted, if soft reset failed, will use full reset. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 6月, 2022 2 次提交
-
-
由 Andrey Grodzovsky 提交于
We removed the wrapper that was queueing the recover function into reset domain queue who was using this name. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
We need to have a work_struct to cancel this reset if another already in progress. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 6月, 2022 2 次提交
-
-
由 Evan Quan 提交于
Enable ASPM support for PCIE 7.4.0 and 7.6.0. Signed-off-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Ramesh Errabolu 提交于
Add support for peer-to-peer communication among AMD GPUs over PCIe bus. Support REQUIRES enablement of config HSA_AMD_P2P. Signed-off-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 6月, 2022 2 次提交
-
-
由 Somalapuram Amaranath 提交于
Added device coredump information: - Kernel version - Module - Time - VRAM status - Guilty process name and PID - GPU register dumps v1 -> v2: Variable name change v1 -> v2: NULL check v1 -> v2: Code alignment v1 -> v2: Adding dummy amdgpu_devcoredump_free v1 -> v2: memset reset_task_info to zero v2 -> v3: add CONFIG_DEV_COREDUMP for variables v2 -> v3: remove NULL check on amdgpu_devcoredump_read Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Somalapuram Amaranath 提交于
Allocate memory for register value and use the same values for devcoredump. v1 -> v2: Change krealloc_array() to kmalloc_array() v2 -> v3: Fix alignment Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 6月, 2022 1 次提交
-
-
由 pengfuyuan 提交于
Fix spelling typo in comments. Reported-by: Nk2ci <kernel-bot@kylinos.cn> Signed-off-by: Npengfuyuan <pengfuyuan@kylinos.cn> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 5月, 2022 2 次提交
-
-
由 Mario Limonciello 提交于
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de08 ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de08 ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Mario Limonciello 提交于
An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC reset for handling aborted suspend can't work with s2idle. This functionality was introduced in commit daf8de08 ("drm/amdgpu: always reset the asic in suspend (v2)"). A few other commits have gone on top of the ASIC reset, but this still doesn't work on the A+A configuration in s2idle. Avoid doing the reset on dGPUs specifically when using s2idle. Fixes: daf8de08 ("drm/amdgpu: always reset the asic in suspend (v2)") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 5月, 2022 2 次提交
-
-
由 Likun Gao 提交于
Add Light SDMA (LSDMA) block and related function. LSDMA is a small instance of SDMA mainly for kernel driver use. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Add sysfs interface to copy VBIOS. v2: squash in fix for proper vmalloc API (Alex) Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 5月, 2022 1 次提交
-
-
由 Alex Deucher 提交于
This reverts commit b95dc06a. This workaround is no longer necessary. We have a better workaround in commit f95af4a9 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)"). Reviewed-by: NJavier Martinez Canillas <javierm@redhat.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 5月, 2022 5 次提交
-
-
由 Jack Xiao 提交于
mes_kiq parameter is used to enable mes kiq pipe. This module parameter is unneccessary or enabled by default in final version. v2: reword commit message. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Xiao 提交于
The per-context meta data is a per-context data structure associated with a mes-managed hardware ring, which includes MCBP CSA, ring buffer and etc. v2: fix typo v3: a. use structure instead of typedef b. move amdgpu_mes_ctx_get_offs_* to amdgpu_ring.h c. use __aligned to make alignement Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> -
由 Jack Xiao 提交于
Define MQD abstract layer for hw ip, for the passing mqd configuration not only from ring but more sources, like user queue. Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Add parmeter to shows whether SCPM feature is enabled or not, and whether is valid. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Unify hdp related function into hdp structure for hdp version 6. V2: Remove hdp invalidate function as hdp v6 doesn't have read cache. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 4月, 2022 2 次提交
-
-
由 Likun Gao 提交于
Add function to decode IP version. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Extend HWIP MAX INSTANCE to 11. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 4月, 2022 1 次提交
-
-
由 Evan Quan 提交于
With this, we can support more CG flags. Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 3月, 2022 2 次提交
-
-
由 Christian König 提交于
No function change, just move a bunch of definitions from amdgpu.h into separate header files. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Ruijing Dong 提交于
vcn fwlog is for debugging purpose only, by default, it is disabled. Signed-off-by: NRuijing Dong <ruijing.dong@amd.com> Reviewed-by: NLeo Liu <leo.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 2月, 2022 1 次提交
-
-
由 Alex Sierra 提交于
This parameter controls xGMI p2p communication, which is enabled by default. However, it can be disabled by setting it to 0. In case xGMI p2p is disabled in a dGPU, PCIe p2p interface will be used instead. This parameter is ignored in GPUs that do not support xGMI p2p configuration. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Acked-by: NLuben Tuikov <luben.tuikov@amd.com> Acked-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 2月, 2022 5 次提交
-
-
由 Somalapuram Amaranath 提交于
List of register populated for dump collection during the GPU reset. Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
This test is not particularly useful now that GTT and GART are decoupled in the driver. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Now that we expose the benchmarks via debugfs, there is no longer a need for the module parameter. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
To avoid multiple runs in parallel to avoid mixing results. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
So we can tell when this function fails. v2: squash in error handling fix (Alex) Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 2月, 2022 1 次提交
-
-
由 Mario Limonciello 提交于
Evaluating `pcie_aspm_enabled` as part of driver probe has the implication that if one PCIe bridge with an AMD GPU connected doesn't support ASPM then none of them do. This is an invalid assumption as the PCIe core will configure ASPM for individual PCIe bridges. Create a new helper function that can be called by individual dGPUs to react to the `amdgpu_aspm` module parameter without having negative results for other dGPUs on the PCIe bus. Suggested-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 2月, 2022 1 次提交
-
-
由 Luben Tuikov 提交于
Add IP discovery data in sysfs. The format is: /sys/class/drm/cardX/device/ip_discovery/die/D/B/I/<attrs> where, X is the card ID, an integer, D is the die ID, an integer, B is the IP HW ID, an integer, aka block type, I is the IP HW ID instance, an integer. <attrs> are the attributes of the block instance. At the moment these include HW ID, instance number, major, minor, revision, number of base addresses, and the base addresses themselves. A symbolic link of the acronym HW ID is also created, under D/, if you prefer to browse by something humanly accessible. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Tom StDenis <tom.stdenis@amd.com> Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 2月, 2022 1 次提交
-
-
由 Andrey Grodzovsky 提交于
We should have a single instance per entrire reset domain. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74116.html
-