- 03 2月, 2022 12 次提交
-
-
由 Shen, George 提交于
[Why] Certain configurations will result in link encoder to not be assigned to the link at the time we apply cable ID logic. We should skip it in those cases. [How] Check if link_enc is not null before applying cable ID. Tested-by: NDaniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: NWenjing Liu <Wenjing.Liu@amd.com> Acked-by: NStylon Wang <stylon.wang@amd.com> Signed-off-by: NGeorge Shen <george.shen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Guchun Chen 提交于
A lot of below message are outputed in SRIOV case. amdgpu: indirect registers access through rlcg is not supported Also drop redundant ret set, as it's initialized to be false already. Fixes: 29dbcac8 ("drm/amdgpu: add helper to query rlcg reg access flag") Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lijo Lazar 提交于
Fix uninitialized variable use warning: variable 'reg_access_ctrl' is uninitialized when used here [-Wuninitialized] scratch_reg0 = (void __iomem *)adev->rmmio + 4 * reg_access_ctrl->scratch_reg0; Fixes: 5d447e29 ("drm/amdgpu: add helper for rlcg indirect reg access") Reported-by: Nkernel test robot <yujie.liu@intel.com> Signed-off-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
1. The infinite loop causing soft lock occurs on multiple amdgpu cards supporting ras feature. 2. This a workaround patch to fix 6492e1b0. It is valid for multiple amdgpu cards of the same type. 3. The root cause is that each GPU card device has a separate .ras_list link header, but the instance and linked list node of each ras block are unique. When each device is initialized, each ras instance will repeatedly add link node to the device every time. In this way, only the .ras_list of the last initialized device is completely correct. the .ras_list->prev and .ras_list->next of the device initialzied before can still point to the correct ras instance, but the prev pointer and next pointer of the pointed ras instance both point to the last initialized device's .ras_ list instead of the beginning .ras_ list. When using list_for_each_entry_safe searches for non-existent Ras nodes on devices other than the last device, the last ras instance next pointer cannot always be equal to the beginning .ras_list, so that the loop cannot be terminated, the program enters a infinite loop. BTW: Since the data and initialization process of each card are the same, the link list between ras instances will not be destroyed every time the device is initialized. 4. The soft locked logs are as follows: [ 262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G OE 5.13.0-27-generic #29~20.04.1-Ubuntu [ 262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020 [ 262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [ 262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu] [ 262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45 [ 262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202 [ 262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248 [ 262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000 [ 262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001 [ 262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e [ 262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003 [ 262.166258] FS: 0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000 [ 262.166261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0 [ 262.166267] Call Trace: [ 262.166272] amdgpu_ras_do_recovery+0x130/0x290 [amdgpu] [ 262.166529] ? psi_task_switch+0xd2/0x250 [ 262.166537] ? __switch_to+0x11d/0x460 [ 262.166542] ? __switch_to_asm+0x36/0x70 [ 262.166549] process_one_work+0x220/0x3c0 [ 262.166556] worker_thread+0x4d/0x3f0 [ 262.166560] ? process_one_work+0x3c0/0x3c0 [ 262.166563] kthread+0x12b/0x150 [ 262.166568] ? set_kthread_struct+0x40/0x40 [ 262.166571] ret_from_fork+0x22/0x30 Fixes: 6492e1b0 ("drm/amdgpu: Unify ras block interface for each ras block") Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Agustin Gutierrez 提交于
[Why] There is underflow / visual corruption DCN301, for high bandwidth MST DSC configurations such as 2x1440p144 or 2x4k60. [How] Use up-to-date watermark values for DCN301. Reviewed-by: NZhan Liu <zhan.liu@amd.com> Signed-off-by: NAgustin Gutierrez <agustin.gutierrez@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
All warnings (new ones prefixed by >>): drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c: In function 'svm_range_deferred_list_work': >> drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_svm.c:2067:22: warning: variable 'p' set but not used [-Wunused-but-set-variable] 2067 | struct kfd_process *p; | Fixes: 367c9b0f ("drm/amdkfd: Ensure mm remain valid in svm deferred_list work") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-By: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Colin Ian King 提交于
There are quite a few spelling mistakes in various function names and error messages. Fix these. Reviewed-by: NHarry Wentland <harry.wentland@amd.com> Signed-off-by: NColin Ian King <colin.i.king@gmail.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Changcheng Deng 提交于
'amdgpu_dpm.h' included in 'arcturus_ppt.c' is duplicated. Reported-by: NZeal Robot <zealci@zte.com.cn> Signed-off-by: NChangcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Changcheng Deng 提交于
'linux/pci.h' included in 'amdgpu_device.c' is duplicated. Reported-by: NZeal Robot <zealci@zte.com.cn> Signed-off-by: NChangcheng Deng <deng.changcheng@zte.com.cn> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lang Yu 提交于
We observed a GPU hang when querying GMC CG state(i.e., cat amdgpu_pm_info) on cyan skillfish. Acctually, cyan skillfish doesn't support any CG features. Just prevent it from accessing GMC CG registers. Signed-off-by: NLang Yu <Lang.Yu@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mario Limonciello 提交于
This will cause misconfigured systems to not run the GPU suspend routines. * In APUs that are properly configured system will go into s2idle. * In APUs that are intended to be S3 but user selects s2idle the GPU will stay fully powered for the suspend. * In APUs that are intended to be s2idle and system misconfigured the GPU will stay fully powered for the suspend. * In systems that are intended to be s2idle, but AMD dGPU is also present, the dGPU will go through S3 Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mario Limonciello 提交于
This will be used to help make decisions on what to do in misconfigured systems. v2: squash in semicolon fix from Stephen Rothwell Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 2月, 2022 1 次提交
-
-
由 Mario Limonciello 提交于
On some OEM setups users can configure the BIOS for S3 or S2idle. When configured to S3 users can still choose 's2idle' in the kernel by using `/sys/power/mem_sleep`. Before commit 6dc8265f ("drm/amdgpu: always reset the asic in suspend (v2)"), the GPU would crash. Now when configured this way, the system should resume but will use more power. As such, adjust the `amdpu_acpi_is_s0ix function` to warn users about potential power consumption issues during their first attempt at suspending. Reported-by: NBjoren Dasse <bjoern.daase@gmail.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1824Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 1月, 2022 26 次提交
-
-
由 huangqu 提交于
Wrong order for config and counter_id parameters was passed, when calling df_v3_6_pmc_set_deferred and df_v3_6_pmc_is_deferred functions. Signed-off-by: Nhuangqu <jinsdb@126.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 tangmeng 提交于
There is a spelling mistake. Fix it. Signed-off-by: Ntangmeng <tangmeng@uniontech.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
So mesa and tools know when this is available. Mesa MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Add a new CTX ioctl operation to set stable pstates for profiling. When creating traces for tools like RGP or using SPM or doing performance profiling, it's required to enable a special stable profiling power state on the GPU. These profiling states set fixed clocks and disable certain other power features like powergating which may impact the results. Historically, these profiling pstates were enabled via sysfs, but this adds an interface to enable it via the CTX ioctl from the application. Since the power state is global only one application can set it at a time, so if multiple applications try and use it only the first will get it, the ioctl will return -EBUSY for others. The sysfs interface will override whatever has been set by this interface. Mesa MR: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/207 v2: don't default r = 0; v3: rebase on Evan's PM cleanup Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Return an error if someone tries to use the i2c bus when the SMU is not running. Otherwise we can end up sending commands to the SMU which will either get ignored or could cause other issues depending on what state the GPU and SMU are in. Cc: Luben.Tuikov@amd.com Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Luben Tuikov 提交于
Enable the FRU EEPROM I2C bus for Sienna Cichlid server boards, for which it is enabled by checking the VBIOS version. Cc: Roy Sun <Roy.Sun@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Luben Tuikov 提交于
Expose both SMU I2C buses. Some boards use the same bus for both the RAS and FRU EEPROMs and others use different buses. This enables the additional I2C bus and sets the right buses to use for RAS and FRU EEPROM access. Cc: Roy Sun <Roy.Sun@amd.com> Co-developed-by: NAlex Deucher <Alexander.Deucher@amd.com> Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Aaron Liu 提交于
This patch adds 1.3.1/2.4.0 athub clock gating support. Signed-off-by: NAaron Liu <aaron.liu@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Aaron Liu 提交于
Use IP version rather than codename for athub. Signed-off-by: NAaron Liu <aaron.liu@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tim Huang 提交于
[Why] It will build failed with unused variable 'dc' with '-Werror=unused-variable'enabled when CONFIG_DRM_AMD_DC_DCN is not defined. Signed-off-by: NTim Huang <tim.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAaron Liu <aaron.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
On ALDEBARAN, the umc channel bits are not original values, they are hashed. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
On ALDEBARAN, we need to traverse all column bits higher than BIT11(C4C3C2) in a row, the shift of R14 bit should be also taken into account. Retire all pages we find. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
One piece of umc normalizing address can be mapped to 16 pieces of physical address in each umc channel on ALDEBARAN. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
Create common amdgpu_umc_fill_error_record function for all versions of UMC and clean up related codes. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mario Limonciello 提交于
A number of BIOS versions have a problem with the watermarks table not being configured properly. This manifests as a very scary looking warning during resume from s0i3. This should be harmless in most cases and is well understood, so decrease the assertion to a clearer warning about the problem. Reviewed-by: NHarry Wentland <harry.wentland@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
On GPUs with RAS, poison can propagate between processes if VRAM is not cleared when it is freed or allocated. The reason is, that not all write accesses clear RAS poison. 32-byte writes by the SDMA engine do clear RAS poison. Clearing memory in the background when it is freed should avoid major performance impact. KFD has been doing this already for a long time. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tianci.Yin 提交于
[why] In rmmod procedure, kfd sends cp a dequeue request, but the request does not get response, then an error message "cp queue pipe 4 queue 0 preemption failed" printed. [how] Performing kfd suspending after disabling gfxoff can fix it. Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NYang Wang <kevinyang.wang@amd.com> Signed-off-by: NTianci.Yin <tianci.yin@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
The sub-routine(amdgpu_gfx_off_ctrl) tried to obtain the lock adev->pm.mutex which was actually hold by amdgpu_dpm_force_performance_level. A deadlock happened then. Signed-off-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
The existing way cannot handle Beige Goby well as a different PPTable data structure(PPTable_beige_goby_t instead of PPTable_t) is used there. Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Fangzhi Zuo 提交于
[Why] configure_dp_hpo_throttled_vcp_size() was missing promotion before, but it was covered by not calling the missing function hook in the old interface hpo_dp_link_encoder->funcs. Recent refactor replaces with new caller link_hwss->set_throttled_vcp_size which needs that hook, and that causes null ptr hang. Signed-off-by: NFangzhi Zuo <Jerry.Zuo@amd.com> Reviewed-by: NHarry Wentland <harry.wentland@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
kfd_process_notifier_release flush svm_range_restore_work which calls svm_range_list_lock_and_flush_work to flush deferred_list work, but if deferred_list work mmput release the last user, it will call exit_mmap -> notifier_release, it is deadlock with below backtrace. Move flush svm_range_restore_work to kfd_process_wq_release to avoid deadlock. Then svm_range_restore_work take task->mm ref to avoid mm is gone while validating and mapping ranges to GPU. Workqueue: events svm_range_deferred_list_work [amdgpu] Call Trace: wait_for_completion+0x94/0x100 __flush_work+0x12a/0x1e0 __cancel_work_timer+0x10e/0x190 cancel_delayed_work_sync+0x13/0x20 kfd_process_notifier_release+0x98/0x2a0 [amdgpu] __mmu_notifier_release+0x74/0x1f0 exit_mmap+0x170/0x200 mmput+0x5d/0x130 svm_range_deferred_list_work+0x104/0x230 [amdgpu] process_one_work+0x220/0x3c0 Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reported-by: NRuili Ji <ruili.ji@amd.com> Tested-by: NRuili Ji <ruili.ji@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
svm_deferred_list work should continue to handle deferred_range_list which maybe split to child range to avoid child range leak, and remove ranges mmu interval notifier to avoid mm mm_count leak. So taking mm reference when adding range to deferred list, to ensure mm is valid in the scheduled deferred_list_work, and drop the mm referrence after range is handled. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reported-by: NRuili Ji <ruili.ji@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
SVM ioctls take proper svms->lock to handle race conditions, don't need take process mutex to serialize ioctls. This also fixes circular locking warning: WARNING: possible circular locking dependency detected Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); lock((work_completion)(&svms->deferred_list_work)); lock(&process->mutex); *** DEADLOCK *** Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Bas Nieuwenhuizen 提交于
Unused. Convert the divisions into asserts on the divisor, to debug why it is zero. The divide by zero is suspected of causing kernel panics. While I have no idea where the zero is coming from I think this patch is a positive either way. Reviewed-by: NHarry Wentland <harry.wentland@amd.com> Signed-off-by: NBas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Eric Huang 提交于
It is to meet the requirement for memory allocation optimization on MI50. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Victor Zhao 提交于
add determine for passthrough mode under arm64 by reading CurrentEL register v2: squash in warning fix (Alex) Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 1月, 2022 1 次提交
-
-
由 Tim Huang 提交于
Use IP versions rather than asic_type to differentiate IP version specific features. Signed-off-by: NTim Huang <tim.huang@amd.com> Reviewed-by: NAaron Liu <aaron.liu@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-