- 10 4月, 2021 23 次提交
-
-
由 Hawking Zhang 提交于
mmhub ras is only avaiable in cerntain mmhub ip generation. Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NDennis Li <Dennis.Li@amd.com> Reviewed-by: NJohn Clements <John.Clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Peng Ju Zhou 提交于
1. expand rlcg interface for gc & mmhub indirect access 2. add rlcg interface for no kiq v2: squash in fix for gfx9 (Changfeng) Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: NEmily.Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Peng Ju Zhou 提交于
get pf2vf msg info at it's earliest time so that guest driver can use these info to decide whether register indirect access enabled. Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: NEmily.Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lijo Lazar 提交于
If reset handler is not implemented, reset error before proceeding. Fixes issue with the following trace - [ 106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0 [ 106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume [ 106.509116] [drm] PCIE GART of 512M enabled. [ 106.509120] [drm] PTB located at 0x0000008000000000 [ 106.509136] [drm] VRAM is lost due to GPU reset! [ 106.509332] [drm] PSP is resuming... Signed-off-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-and-tested-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lijo Lazar 提交于
Add aldebaran to devices which support recovery Signed-off-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lijo Lazar 提交于
Expose PG/CG set states functions for other clients Signed-off-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lijo Lazar 提交于
This prefers reset control based handling if it's implemented for a particular ASIC. If not, it takes the legacy path. It uses the legacy method of preparing environment (job, scheduler tasks) and restoring environment. v2: remove unused variable (Alex) Signed-off-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Zhang 提交于
[Why] Previous tdr design treats the first job in job_timeout as the bad job. But sometimes a later bad compute job can block a good gfx job and cause an unexpected gfx job timeout because gfx and compute ring share internal GC HW mutually. [How] This patch implements an advanced tdr mode.It involves an additinal synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit step in order to find the real bad job. 1. At Step0 Resubmit stage, it synchronously submits and pends for the first job being signaled. If it gets timeout, we identify it as guilty and do hw reset. After that, we would do the normal resubmit step to resubmit left jobs. 2. For whole gpu reset(vram lost), do resubmit as the old way. v2: squash in build fix (Alex) Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tian Tao 提交于
Fix the following coccicheck warning: drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING: use scnprintf or sprintf drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING: use scnprintf or sprintf Signed-off-by: NTian Tao <tiantao6@hisilicon.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Horace Chen 提交于
[what] currently driver recover vram after full access, which may hit a corner case that meanwhile another whole gpu reset may be triggered by another VF, which will cause vram recover fail then fail the whole device reset. [how] move the recover vram into full access. So another bad VF will not disturb the recover sequence for this vf. Signed-off-by: NHorace Chen <horace.chen@amd.com> Reviewed by: Monk.Liu <monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
GFX is in gfxoff mode during s0ix so we shouldn't need to actually tear anything down and restore it. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
We handle it properly within the CG/PG functions directly now. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Pratik Vishwakarma 提交于
Not needed as the device is in gfxoff state so the CG/PG state is handled just like it would be for gfxoff during runtime gfxoff. This should also prevent delays on resume. Reworked from Pratik's original patch (Alex) Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NPratik Vishwakarma <Pratik.Vishwakarma@amd.com>
-
由 Alex Deucher 提交于
Provide and explanation as to why we skip GFX and PSP for S0ix. GFX goes into gfxoff, same as runtime, so no need to tear down and re-init. PSP is part of the always on state, so no need to touch it. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
The SMU expects CGPG to be enabled when entering S0ix. with this we can re-enable SMU suspend. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
This really needs to be done to properly tear down the device. SMC, PSP, and GFX are still problematic, need to dig deeper into what aspect of them that is problematic. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
No functional change. v2: use correct dev v3: rework Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Move the non-DC specific code into the DCE IP blocks similar to how we handle DC. This cleans up the common suspend and resume pathes. Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Set flags at the top level pmops callbacks to track state. This cleans up the current set of flags and properly handles S4 on S0ix capable systems. Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Prike Liang 提交于
During system hibernation suspend still need un-gate gfx CG/PG firstly to handle HW status check before HW resource destory. Signed-off-by: NPrike Liang <Prike.Liang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
There's no need to keep vgaswitcheroo around for HG systems. They don't use muxes and their power control is handled via ACPI. Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Xiaojian Du 提交于
This reverts commit 33cf440d. And it will enable mode-2 gpu reset for vangogh, it asks PSP firmware version is 00.1A.00.0F or newer. Signed-off-by: NXiaojian Du <Xiaojian.Du@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dennis Li 提交于
When recovery thread has begun GPU reset, there should be not other threads to access hardware, otherwise system randomly hang. v2 (chk): rewritten from scratch, use trylock and lockdep instead of hand wiring the logic. v3: add in_irq check v4: change to check in_task Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 3月, 2021 12 次提交
-
-
由 Alex Deucher 提交于
We set the same variable a few lines above. Drop the duplicate setting. Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
This is to fix the case where it only enable the light SMU on normal device init. This feature actually need to be enabled after ASIC been reset as well. Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
It was leftover from radeon where it was required for some specific old hardware. It hasn't been required for ages and the driver already falls back to MMIO when legacy IO is not available. Legacy IO also seems to be problematic on on some thunderbolt devices. Drop it. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
-
由 Christian König 提交于
Interrupts on are non-reentrant on linux. This is just an ancient leftover from radeon where irq processing was kicked of from different places. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Stanley.Yang 提交于
cause: It is necessary to send ras disable command to ras-ta during gfx block ras later init, because the ras capability is disable read from vbios for vega20 gaming, but the ras context is released during ras init process, this will cause send ras disable command to ras-to failed. how: Delay releasing ras context, the ras context will be released after gfx block later init done. Changed from V1: move release_ras_context into ras_resume Changed from V2: check BIT(UMC) is more reasonable before access eeprom table Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Colin Ian King 提交于
There is a spelling mistake in a drm debug message. Fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
SMU introduce the new interface to enable light Secondary Bus Reset mode, driver enable it on passthrough + XGMI configuration Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices without sync to each other. This could cause device hang since for XGMI configuration, all the devices within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
The gmc.xgmi.head list originally is designed for device list in the XGMI hive. Mix use it for reset purpose will prevent the reset function to adjust XGMI device list which is required in next change Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: Andrey Grodzovsky andrey.grodzovsky@amd.com Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
amdgpu driver may be in reset state during init which will not initialize the kfd, driver need to initialize the KFD after reset by check the flag Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
driver should use the gfx_info atomfirmware interface Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Feifei Xu 提交于
Use MSG_GfxDriverReset for mode reset and retire MSG_Mode1Reset. Centralize soc15_asic_mode1_reset() and nv_asic_mode1_reset()functions. Add mode2_reset_is_support() for smu->ppt_funcs. Signed-off-by: NFeifei Xu <Feifei.Xu@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 3月, 2021 5 次提交
-
-
由 Alex Deucher 提交于
GFX is in gfxoff mode during s0ix so we shouldn't need to actually tear anything down and restore it. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
We handle it properly within the CG/PG functions directly now. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Pratik Vishwakarma 提交于
Not needed as the device is in gfxoff state so the CG/PG state is handled just like it would be for gfxoff during runtime gfxoff. This should also prevent delays on resume. Reworked from Pratik's original patch (Alex) Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NPratik Vishwakarma <Pratik.Vishwakarma@amd.com>
-
由 Alex Deucher 提交于
Provide and explanation as to why we skip GFX and PSP for S0ix. GFX goes into gfxoff, same as runtime, so no need to tear down and re-init. PSP is part of the always on state, so no need to touch it. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
The SMU expects CGPG to be enabled when entering S0ix. with this we can re-enable SMU suspend. Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-