提交 · b45aeb2dea9142d4d32fa3a117ba381d84f27065 · openeuler / Kernel

10 4月, 2021 23 次提交

drm/amdgpu: split mmhub callbacks into ras and non-ras ones · 8bc7b360

由 Hawking Zhang 提交于 3月 19, 2021

mmhub ras is only avaiable in cerntain mmhub ip
generation.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8bc7b360

drm/amdgpu: indirect register access for nv12 sriov · 5e025531

由 Peng Ju Zhou 提交于 3月 22, 2021

1. expand rlcg interface for gc & mmhub indirect access
2. add rlcg interface for no kiq

v2: squash in fix for gfx9 (Changfeng)
Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: NEmily.Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5e025531

drm/amdgpu: indirect register access for nv12 sriov · 77eabc6f

由 Peng Ju Zhou 提交于 3月 29, 2021

get pf2vf msg info at it's earliest time so that
guest driver can use these info to decide whether
register indirect access enabled.
Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: NEmily.Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

77eabc6f

drm/amdgpu: Reset error code for 'no handler' case · 404b277b

由 Lijo Lazar 提交于 3月 26, 2021

If reset handler is not implemented, reset error before proceeding.

Fixes issue with the following trace -
[  106.508592] amdgpu 0000:b1:00.0: amdgpu: ASIC reset failed with error, -38 for drm dev, 0000:b1:00.0
[  106.508972] amdgpu 0000:b1:00.0: amdgpu: GPU reset succeeded, trying to resume
[  106.509116] [drm] PCIE GART of 512M enabled.
[  106.509120] [drm] PTB located at 0x0000008000000000
[  106.509136] [drm] VRAM is lost due to GPU reset!
[  106.509332] [drm] PSP is resuming...
Signed-off-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-and-tested-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

404b277b

drm/amdgpu: Enable recovery on aldebaran · ea4e96a7

由 Lijo Lazar 提交于 3月 23, 2021

Add aldebaran to devices which support recovery
Signed-off-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ea4e96a7

drm/amdgpu: Make set PG/CG state functions public · 5d89bb2d

由 Lijo Lazar 提交于 3月 16, 2021

Expose PG/CG set states functions for other clients
Signed-off-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5d89bb2d

drm/amdgpu: Add reset control handling to reset workflow · 04442bf7

由 Lijo Lazar 提交于 3月 16, 2021

This prefers reset control based handling if it's implemented
for a particular ASIC. If not, it takes the legacy path. It uses
the legacy method of preparing environment (job, scheduler tasks)
and restoring environment.

v2: remove unused variable (Alex)
Signed-off-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

04442bf7

drm/amd/amdgpu implement tdr advanced mode · e6c6338f

由 Jack Zhang 提交于 3月 08, 2021

[Why]
Previous tdr design treats the first job in job_timeout as the bad job.
But sometimes a later bad compute job can block a good gfx job and
cause an unexpected gfx job timeout because gfx and compute ring share
internal GC HW mutually.

[How]
This patch implements an advanced tdr mode.It involves an additinal
synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
step in order to find the real bad job.

1. At Step0 Resubmit stage, it synchronously submits and pends for the
first job being signaled. If it gets timeout, we identify it as guilty
and do hw reset. After that, we would do the normal resubmit step to
resubmit left jobs.

2. For whole gpu reset(vram lost), do resubmit as the old way.

v2: squash in build fix (Alex)
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e6c6338f

drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emit · 36000c7a

由 Tian Tao 提交于 3月 24, 2021

Fix the following coccicheck warning:
drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING:
use scnprintf or sprintf
Signed-off-by: NTian Tao <tiantao6@hisilicon.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

36000c7a

drm/amdgpu: move vram recover into sriov full access · 437f3e0b

由 Horace Chen 提交于 3月 23, 2021

[what]
currently driver recover vram after full access, which may hit
a corner case that meanwhile another whole gpu reset may be
triggered by another VF, which will cause vram recover fail
then fail the whole device reset.

[how]
move the recover vram into full access. So another bad VF will
not disturb the recover sequence for this vf.
Signed-off-by: NHorace Chen <horace.chen@amd.com>
Reviewed by: Monk.Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

437f3e0b

drm/amdgpu: skip kfd suspend/resume for S0ix · 5d3a2d95

由 Alex Deucher 提交于 3月 16, 2021

GFX is in gfxoff mode during s0ix so we shouldn't need to
actually tear anything down and restore it.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5d3a2d95

drm/amdgpu: drop S0ix checks around CG/PG in suspend · 50ec83f0

由 Alex Deucher 提交于 3月 16, 2021

We handle it properly within the CG/PG functions directly
now.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

50ec83f0

drm/amdgpu: skip CG/PG for gfx during S0ix · 5d70a549

由 Pratik Vishwakarma 提交于 3月 16, 2021

Not needed as the device is in gfxoff state so the CG/PG state
is handled just like it would be for gfxoff during runtime gfxoff.

This should also prevent delays on resume.

Reworked from Pratik's original patch (Alex)
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NPratik Vishwakarma <Pratik.Vishwakarma@amd.com>

5d70a549

drm/amdgpu: update comments about s0ix suspend/resume · 32ff160d

由 Alex Deucher 提交于 3月 16, 2021

Provide and explanation as to why we skip GFX and PSP for
S0ix.  GFX goes into gfxoff, same as runtime, so no need
to tear down and re-init.  PSP is part of the always on
state, so no need to touch it.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

32ff160d

drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend · f9370087

由 Alex Deucher 提交于 3月 12, 2021

The SMU expects CGPG to be enabled when entering S0ix.
with this we can re-enable SMU suspend.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f9370087

drm/amdgpu: re-enable suspend phase 2 for S0ix · 557f42a2

由 Alex Deucher 提交于 3月 12, 2021

This really needs to be done to properly tear down
the device.  SMC, PSP, and GFX are still problematic,
need to dig deeper into what aspect of them that is
problematic.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

557f42a2

drm/amdgpu: move s0ix check into amdgpu_device_ip_suspend_phase2 (v3) · 34416931

由 Alex Deucher 提交于 3月 12, 2021

No functional change.

v2: use correct dev
v3: rework
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

34416931

drm/amdgpu: clean up non-DC suspend/resume handling · a2e15b0e

由 Alex Deucher 提交于 3月 19, 2021

Move the non-DC specific code into the DCE IP blocks similar
to how we handle DC.  This cleans up the common suspend
and resume pathes.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2e15b0e

drm/amdgpu: rework S3/S4/S0ix state handling · 62498733

由 Alex Deucher 提交于 3月 12, 2021

Set flags at the top level pmops callbacks to track
state.  This cleans up the current set of flags and
properly handles S4 on S0ix capable systems.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

62498733

drm/amdgpu: fix the hibernation suspend with s0ix · e5192f7b

由 Prike Liang 提交于 3月 09, 2021

During system hibernation suspend still need un-gate gfx CG/PG firstly to handle HW
status check before HW resource destory.
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e5192f7b

drm/amdgpu: disentangle HG systems from vgaswitcheroo · b98c6299

由 Alex Deucher 提交于 3月 10, 2021

There's no need to keep vgaswitcheroo around for HG
systems.  They don't use muxes and their power control
is handled via ACPI.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b98c6299

Revert "drm/amdgpu: disable gpu reset on Vangogh for now" · fe68ceef

由 Xiaojian Du 提交于 3月 18, 2021

This reverts commit 33cf440d.
And it will enable mode-2 gpu reset for vangogh,
it asks PSP firmware version is 00.1A.00.0F or newer.
Signed-off-by: NXiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fe68ceef

drm/amdgpu: add codes to capture invalid hardware access when recovery · 56b53c0b

由 Dennis Li 提交于 3月 10, 2021

When recovery thread has begun GPU reset, there should be not other
threads to access hardware, otherwise system randomly hang.

v2 (chk): rewritten from scratch, use trylock and lockdep instead of
hand wiring the logic.

v3: add in_irq check

v4: change to check in_task
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

56b53c0b

24 3月, 2021 12 次提交

drm/amdgpu: drop extraneous hw_status update · 2d28b70e

由 Alex Deucher 提交于 3月 15, 2021

We set the same variable a few lines above.  Drop the duplicate
setting.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2d28b70e

drm/amdgpu: Enable light SBR in XGMI+passthrough configuration · 2d02893f

由 shaoyunl 提交于 3月 11, 2021

This is to fix the case where it only enable the light SMU
on normal device init. This feature actually need to be enabled after ASIC
been reset as well.
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2d02893f

drm/amdgpu: drop legacy IO bar support · e99d2eaa

由 Alex Deucher 提交于 3月 15, 2021

It was leftover from radeon where it was required for some
specific old hardware.  It hasn't been required for ages
and the driver already falls back to MMIO when legacy IO
is not available.  Legacy IO also seems to be problematic on
on some thunderbolt devices.  Drop it.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>

e99d2eaa

drm/amdgpu: nuke the ih reentrant lock · d423f551

由 Christian König 提交于 3月 12, 2021

Interrupts on are non-reentrant on linux. This is just an ancient
leftover from radeon where irq processing was kicked of from different
places.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d423f551

drm/amdgpu: fix send ras disable cmd when asic not support ras · 970fd197

由 Stanley.Yang 提交于 3月 10, 2021

    cause:
	It is necessary to send ras disable command to ras-ta during gfx
	block ras later init, because the ras capability is disable read
	from vbios for vega20 gaming, but the ras context is released
	during ras init process, this will cause send ras disable command
	to ras-to failed.
    how:
	Delay releasing ras context, the ras context
	will be released after gfx block later init done.

Changed from V1:
    move release_ras_context into ras_resume

Changed from V2:
    check BIT(UMC) is more reasonable before access eeprom table
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

970fd197

drm/amdgpu: Fix spelling mistake "disabed" -> "disabled" · 751f43e7

由 Colin Ian King 提交于 3月 11, 2021

There is a spelling mistake in a drm debug message. Fix it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

751f43e7

drm/amdgpu: Enable light SBR for SMU on passthrough and XGMI configuration · 3ae3a4ad

由 shaoyunl 提交于 3月 10, 2021

SMU introduce the new interface to enable light Secondary Bus Reset mode, driver
enable it on passthrough + XGMI configuration
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3ae3a4ad

drm/amdgpu: Reset the devices in the XGMI hive duirng probe · e3c1b071

由 shaoyunl 提交于 2月 16, 2021

In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices
without sync to each other. This could cause device hang since for XGMI configuration, all the devices
within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue
by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't
do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init
to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e3c1b071

drm/amdgpu: Add reset_list for device list used for reset · 655ce9cb

由 shaoyunl 提交于 3月 04, 2021

The gmc.xgmi.head list originally is designed for device list in the XGMI hive. Mix use it
for reset purpose will prevent the reset function to adjust XGMI device list which is required
in next change
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

655ce9cb

drm/amdgpu: Add kfd init_complete flag to check from amdgpu side · 8e2712e7

由 shaoyunl 提交于 2月 16, 2021

amdgpu driver may be in reset state during init which will not initialize the kfd,
driver need to initialize the KFD after reset by check the flag
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8e2712e7

drm/amdgpu: retire aldebaran gpu_info firmware · 44b3253a

由 Hawking Zhang 提交于 11月 16, 2020

driver should use the gfx_info atomfirmware interface
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

44b3253a

drm/amdgpu:add smu mode1/2 support for aldebaran · 5c03e584

由 Feifei Xu 提交于 11月 19, 2020

Use MSG_GfxDriverReset for mode reset and retire MSG_Mode1Reset.
Centralize soc15_asic_mode1_reset() and nv_asic_mode1_reset()functions.
Add mode2_reset_is_support() for smu->ppt_funcs.
Signed-off-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5c03e584

23 3月, 2021 5 次提交

drm/amdgpu: skip kfd suspend/resume for S0ix · ac5789ef

由 Alex Deucher 提交于 3月 16, 2021

GFX is in gfxoff mode during s0ix so we shouldn't need to
actually tear anything down and restore it.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ac5789ef

drm/amdgpu: drop S0ix checks around CG/PG in suspend · 26470500

由 Alex Deucher 提交于 3月 16, 2021

We handle it properly within the CG/PG functions directly
now.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

26470500

drm/amdgpu: skip CG/PG for gfx during S0ix · 10cb67eb

由 Pratik Vishwakarma 提交于 3月 16, 2021

Not needed as the device is in gfxoff state so the CG/PG state
is handled just like it would be for gfxoff during runtime gfxoff.

This should also prevent delays on resume.

Reworked from Pratik's original patch (Alex)
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NPratik Vishwakarma <Pratik.Vishwakarma@amd.com>

10cb67eb

drm/amdgpu: update comments about s0ix suspend/resume · 9bb735ab

由 Alex Deucher 提交于 3月 16, 2021

Provide and explanation as to why we skip GFX and PSP for
S0ix.  GFX goes into gfxoff, same as runtime, so no need
to tear down and re-init.  PSP is part of the always on
state, so no need to touch it.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9bb735ab

drm/amdgpu/swsmu: skip gfx cgpg on s0ix suspend · 4021229e

由 Alex Deucher 提交于 3月 12, 2021

The SMU expects CGPG to be enabled when entering S0ix.
with this we can re-enable SMU suspend.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4021229e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功