提交 · 40e611bdd1c762fc858ef22e8f206066ce844c44 · openeuler / Kernel

23 4月, 2020 2 次提交

drm/amdgpu: update psp fw loading sequence · 40e611bd

由 John Clements 提交于 4月 14, 2020

Added dedicated function to check if particular fw should be skipped from loading.

Added dedicated function for SMU FW loading via PSP
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

40e611bd

drm/amdgpu: fix the hw hang during perform system reboot and reset · ced1ba97

由 Prike Liang 提交于 4月 13, 2020

The system reboot failed as some IP blocks enter power gate before perform
hw resource destory. Meanwhile use unify interface to set device CGPG to ungate
state can simplify the amdgpu poweroff or reset ungate guard.

Fixes: 487eca11 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Tested-by: NMengbing Wang <Mengbing.Wang@amd.com>
Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ced1ba97

14 4月, 2020 13 次提交

drm/amdgpu: remove dead code in si_dpm.c · 8e2f8420

由 Jason Yan 提交于 4月 13, 2020

This code is dead, let's remove it.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8e2f8420

drm/amd/amdgpu: remove hardcoded module name in prints · dd4fa6c1

由 Aurabindo Pillai 提交于 4月 08, 2020

Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd4fa6c1

drm/amd/amdgpu: add print prefix for dev_* variants · 539489fc

由 Aurabindo Pillai 提交于 4月 08, 2020

Define dev_fmt macro for informative print messages
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

539489fc

drm/amd/amdgpu: add prefix for pr_* prints · d57229b1

由 Aurabindo Pillai 提交于 4月 08, 2020

amdgpu uses lots of pr_* calls for printing error messages.
With this prefix, errors shall be more obvious to the end
use regarding its origin, and may help debugging.

Prefix format:

[xxx.xxxxx] amdgpu: ...
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d57229b1

drm/amdgpu/ring: simplify scheduler setup logic · a4c24680

由 Alex Deucher 提交于 4月 09, 2020

Set up a GPU scheduler based on the ring flag rather
than the ring type.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4c24680

drm/amdgpu/kiq: add no_scheduler flag to KIQ · a783910d

由 Alex Deucher 提交于 4月 09, 2020

We don't want a GPU scheduler for this ring.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a783910d

drm/amdgpu/ring: add no_scheduler flag · cb3d1085

由 Alex Deucher 提交于 4月 09, 2020

This allows IPs to flag whether a specific ring requires
a GPU scheduler or not.  E.g., sometimes instances of an
IP are asymmetric and have different capabilities.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cb3d1085

drm/amdgpu: fix wrong vram lost counter increment V2 · dadce777

由 Evan Quan 提交于 4月 10, 2020

Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dadce777

drm/amdgpu: replace DRM prefix with PCI device info for GFX RAS · ed72aa21

由 Guchun Chen 提交于 4月 13, 2020

Prefix RAS message printing in GFX IP with PCI device info,
which assists the debug in multiple GPU case.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ed72aa21

drm/amdgpu: resume kiq access debugfs · d32709da

由 Yintian Tao 提交于 4月 13, 2020

If there is no GPU hang, user still can access
debugfs through kiq.
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NMonk Liu <Monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d32709da

drm/amdgpu: refine ras related message print · 6952e99c

由 Guchun Chen 提交于 4月 10, 2020

Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.
Suggested-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6952e99c

drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb · 1f3ef0ef

由 Guchun Chen 提交于 4月 10, 2020

Uncorrectable error count printing is missed when issuing UMC
UE injection. When going to the error count log function in GPU
recover work thread, there is no chance to get correct error count
value by last error injection and print, because the error status
register is automatically cleared after reading in UMC ecc irq
callback. So add such message printing in UMC ecc irq cb to be
consistent with other RAS error interrupt cases.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1f3ef0ef

drm/amdgpu: restrict debugfs register access under SR-IOV · 95a2f917

由 Yintian Tao 提交于 4月 07, 2020

Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
    amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
    and directly return result
Signed-off-by: NYintian Tao <yttao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95a2f917

09 4月, 2020 25 次提交

drm/amdgpu: increased atom cmd timeout · 9a785c7a

由 John Clements 提交于 4月 09, 2020

added macro to define timeout
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9a785c7a

amdgpu_kms: Remove unnecessary condition check · ad36d71b

由 Aurabindo Pillai 提交于 4月 07, 2020

Execution will only reach here if the asserted condition is true.
Hence there is no need for the additional check.
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ad36d71b

drm/amdgpu: unify fw_write_wait for new gfx9 asics · ba714a56

由 Aaron Liu 提交于 4月 07, 2020

Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Tested-by: NAaron Liu <aaron.liu@amd.com>
Tested-by: NYuxian Dai <Yuxian.Dai@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ba714a56

drm/amdgpu: support access regs outside of mmio bar · 2eee0229

由 Hawking Zhang 提交于 4月 08, 2020

add indirect access support to registers outside of
mmio bar.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2eee0229

drm/amdgpu: retire AMDGPU_REGS_KIQ flag · f384ff95

由 Hawking Zhang 提交于 4月 03, 2020

all the register access through kiq is redirected
to amdgpu_kiq_rreg/amdgpu_kiq_wreg
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f384ff95

drm/amdgpu: retire RREG32_IDX/WREG32_IDX · ec59847e

由 Hawking Zhang 提交于 4月 03, 2020

those are not needed anymore
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ec59847e

drm/amdgpu: retire indirect mmio reg support from cgs · 3c888c16

由 Hawking Zhang 提交于 4月 03, 2020

not needed anymore
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3c888c16

drm/amdgpu: replace indirect mmio access in non-dc code path · 46e840ed

由 Hawking Zhang 提交于 4月 03, 2020

all the mmCUR_CONTROL instances are in mmr range and
can be accessd directly by using RREG32/WREG32
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

46e840ed

drm/amdgpu: remove inproper workaround for vega10 · dec0520a

由 Hawking Zhang 提交于 4月 03, 2020

the workaround is not needed for soc15 ASICs except
for vega10. it is even not needed with latest vega10
vbios.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dec0520a

drm/amdgpu: fix gfx hang during suspend with video playback (v2) · a23ca7f7

由 Prike Liang 提交于 4月 07, 2020

The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.

v2: Use disable the GFX CGPG instead of RLC safe mode guard.
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Tested-by: NMengbing Wang <Mengbing.Wang@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a23ca7f7

drm/amdgpu: Re-enable FRU check for most models v5 · 1ea2b260

由 Kent Russell 提交于 4月 03, 2020

There is at least 1 VG20 DID that does not have an FRU, and trying to read
that will cause a hang. For now, explicitly support reading the FRU for
Arcturus and for the WKS VG20 DIDs, and skip for everything else.
This re-enables serial number reporting for server cards

v2: Add ASIC check
v3: Don't default to true for pre-VG20
v4: Use DID instead of parsing the VBIOS
v5: Sqaush in overflow warning fix
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1ea2b260

drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset · b639c22c

由 Jack Zhang 提交于 4月 07, 2020

[PATCH 2/2]
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those
allocated memories and get memory leak.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b639c22c

drm/amdkfd Avoid destroy hqd when GPU is on reset · fe9824d1

由 Jack Zhang 提交于 4月 07, 2020

This reverts commit 5161bba4311f in order to split it into two
different patches, and this will make it easier to understand.

[PATCH 1/2]
porting to gfx10 from
commit 1b0bfcff ("drm/amdgpu: Avoid destroy hqd when GPU is on reset")

Originally, MEC is touched
without GPU initialized first.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fe9824d1

drm/amdgpu: update RAS related dmesg print · 4a06686b

由 John Clements 提交于 4月 07, 2020

prefix RAS error related dmesg print with pci device info
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a06686b

drm/amdgpu: resolve mGPU RAS query instability · b3dbd6d3

由 John Clements 提交于 4月 07, 2020

upon receiving uncorrectable error, query every GPU node for ras errors
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b3dbd6d3

drm/amd/amdgpu: Correct gfx10's CG sequence · c419bdf5

由 Chengming Gui 提交于 4月 03, 2020

Incorrect CG sequence will cause gfx timedout,
if we keep switching power profile mode
(enter profile mod such as PEAK will disable CG,
exit profile mode EXIT will enable CG)
when run Vulkan test case(case used for test: vkexample).
Signed-off-by: NChengming Gui <Jack.Gui@amd.com>
Reviewed-by: NKenneth Feng <kenneth.feng@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c419bdf5

drm/amdgpu: add SPM golden settings for Navi12 · b2d92682

由 Tianci.Yin 提交于 4月 07, 2020

Add RLC_SPM golden settings
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NTianci.Yin <tianci.yin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b2d92682

drm/amdgpu: add SPM golden settings for Navi14 · a900f562

由 Tianci.Yin 提交于 4月 07, 2020

Add RLC_SPM golden settings
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NTianci.Yin <tianci.yin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a900f562

drm/amdgpu: add SPM golden settings for Navi10(v2) · 4189425d

由 Tianci.Yin 提交于 4月 02, 2020

Add RLC_SPM golden settings
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NTianci.Yin <tianci.yin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4189425d

drm/amdgpu: Print UTCL2 client ID on a gpuvm fault · d2155a71

由 Oak Zeng 提交于 4月 06, 2020

UTCL2 client ID is useful information to get which
UTCL2 client caused the gpuvm fault. Print it out
for debug purpose
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChristian Konig <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d2155a71

drm/amdgpu/vcn: add shared memory restore after wake up from sleep. · 21b704d7

由 James Zhu 提交于 4月 02, 2020

VCN shared memory needs restore after wake up during S3 test.

v2: Allocate shared memory saved_bo at sw_init and free it in sw_fini.
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

21b704d7

drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event · 2a20e630

由 Aaron Ma 提交于 4月 03, 2020

On ARCTURUS and RENOIR, powerplay is not supported yet.
When plug in or unplug power jack, ACPI event will issue.
Then kernel NULL pointer BUG will be triggered.
Check for NULL pointers before calling.
Signed-off-by: NAaron Ma <aaron.ma@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2a20e630

drm/amdgpu/psp: dont warn on missing optional TA's · a45a9e5e

由 Alex Deucher 提交于 4月 03, 2020

Replace dev_warn() with dev_info() and note that they are
optional to avoid confusing users.

The RAS TAs only exist on server boards and the HDCP and DTM
TAs only exist on client boards.  They are optional either way.
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a45a9e5e

drm/amdgpu: rework sched_list generation · 1c6d567b

由 Nirmoy Das 提交于 4月 01, 2020

Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1c6d567b

drm/amdgpu: sync ring type and drm hw_ip type · 07e14845

由 Nirmoy Das 提交于 3月 31, 2020

Use AMDGPU_HW_IP_* to set amdgpu_ring_type enum values
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

07e14845

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功