提交 · dd4fa6c1b89a4f4034ec79c65405fdfc920204d1 · openeuler / Kernel

14 4月, 2020 2 次提交

drm/amd/amdgpu: remove hardcoded module name in prints · dd4fa6c1

由 Aurabindo Pillai 提交于 4月 08, 2020

Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd4fa6c1

drm/amdgpu: fix wrong vram lost counter increment V2 · dadce777

由 Evan Quan 提交于 4月 10, 2020

Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dadce777

09 4月, 2020 9 次提交

drm/amdgpu: support access regs outside of mmio bar · 2eee0229

由 Hawking Zhang 提交于 4月 08, 2020

add indirect access support to registers outside of
mmio bar.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2eee0229

drm/amdgpu: retire AMDGPU_REGS_KIQ flag · f384ff95

由 Hawking Zhang 提交于 4月 03, 2020

all the register access through kiq is redirected
to amdgpu_kiq_rreg/amdgpu_kiq_wreg
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f384ff95

drm/amdgpu: retire RREG32_IDX/WREG32_IDX · ec59847e

由 Hawking Zhang 提交于 4月 03, 2020

those are not needed anymore
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ec59847e

drm/amdgpu: remove inproper workaround for vega10 · dec0520a

由 Hawking Zhang 提交于 4月 03, 2020

the workaround is not needed for soc15 ASICs except
for vega10. it is even not needed with latest vega10
vbios.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dec0520a

drm/amdgpu: fix gfx hang during suspend with video playback (v2) · a23ca7f7

由 Prike Liang 提交于 4月 07, 2020

The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.

v2: Use disable the GFX CGPG instead of RLC safe mode guard.
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Tested-by: NMengbing Wang <Mengbing.Wang@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a23ca7f7

drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset · b639c22c

由 Jack Zhang 提交于 4月 07, 2020

[PATCH 2/2]
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those
allocated memories and get memory leak.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b639c22c

drm/amdkfd Avoid destroy hqd when GPU is on reset · fe9824d1

由 Jack Zhang 提交于 4月 07, 2020

This reverts commit 5161bba4311f in order to split it into two
different patches, and this will make it easier to understand.

[PATCH 1/2]
porting to gfx10 from
commit 1b0bfcff ("drm/amdgpu: Avoid destroy hqd when GPU is on reset")

Originally, MEC is touched
without GPU initialized first.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fe9824d1

drm/amdgpu: rework sched_list generation · 1c6d567b

由 Nirmoy Das 提交于 4月 01, 2020

Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.

v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum

v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list

v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1c6d567b

drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset · 04bef61e

由 Jack Zhang 提交于 4月 02, 2020

kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate

Without this change, sriov tdr code path will never free those allocated
memories and get memory leak.

v2:add a bugfix for kiq ring test fail
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

04bef61e

02 4月, 2020 7 次提交

drm/amdgpu: extend compute job timeout · b7b2a316

由 Jiawei 提交于 3月 26, 2020

extend compute lockup timeout to 60000 for SR-IOV.
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NJiawei <Jiawei.Gu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b7b2a316

drm/amdgpu: postpone entering fullaccess mode · 2f294132

由 Monk Liu 提交于 3月 10, 2020

if host support new handshake we only need to enter
fullaccess_mode in ip_init() part, otherwise we need
to do it before reading vbios (becuase host prepares vbios
for VF only after received REQ_GPU_INIT event under
legacy handshake)
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2f294132

drm/amdgpu: adjust sequence of ip_discovery init and timeout_setting · dffa11b4

由 Monk Liu 提交于 3月 04, 2020

what:
1)move timtout setting before ip_early_init to reduce exclusive mode
cost for SRIOV

2)move ip_discovery_init() to inside of amdgpu_discovery_reg_base_init()
it is a prepare for the later upcoming patches.

why:
in later upcoming patches we would use a new mailbox event --
"req_gpu_init_data", which is a callback hooked in adev->virt.ops and
this callback send a new event "REQ_GPU_INIT_DAT" to host to notify
host to do some preparation like "IP discovery/vbios on the VF FB"
and this callback must be:

A) invoked after set_ip_block() because virt.ops is configured during
set_ip_block()

B) invoked before ip_discovery_init() becausen ip_discovery_init()
need host side prepares everything in VF FB first.

current place of ip_discovery_init() is before we can invoke callback
of adev->virt.ops, thus we must move ip_discovery_init() to a place
after the adev->virt.ops all settle done, and the perfect place is in
amdgpu_discovery_reg_base_init()
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dffa11b4

drm/amdgpu: equip new req_init_data handshake · 122078de

由 Monk Liu 提交于 3月 04, 2020

by this new handshake host side can prepare vbios/ip-discovery
and pf&vf exchange data upon recieving this request without
stopping world switch.

this way the world switch is less impacted by VF's exclusive mode
request
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

122078de

drm/amdgpu: disable ras query and iject during gpu reset · 61380faa

由 John Clements 提交于 3月 25, 2020

added flag to ras context to indicate if ras query functionality is ready
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

61380faa

drm/amdgpu: cleanup all virtualization detection routine · 3aa0115d

由 Monk Liu 提交于 3月 04, 2020

we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
0x1503

2) we need to acknowledged we are SRIOV VF before we do IP discovery because
the IP discovery content will be updated by host everytime after it recieved
a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
for this new handshake soon).
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3aa0115d

drm/amdgpu: Enable reading FRU chip via I2C v3 · bd607166

由 Kent Russell 提交于 3月 13, 2020

Allow for reading of information like manufacturer, product number
and serial number from the FRU chip. Report the serial number as
the new sysfs file serial_number. Note that this only works on
server cards, as consumer cards do not feature the FRU chip, which
contains this information.

v2: Add documentation to amdgpu.rst, add helper functions,
    rename functions for consistency, fix bad starting offset
v3: Remove testing definitions
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bd607166

20 3月, 2020 1 次提交

drm/amdgpu: protect RAS sysfs during GPU reset · 43c4d576

由 John Clements 提交于 3月 19, 2020

MMHub EDC becomes dirty after BACO reset

EDC registers should be cleared early on in reset phase
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

43c4d576

17 3月, 2020 1 次提交

drm/amdgpu: revise RLCG access path · 2e0cc4d4

由 Monk Liu 提交于 3月 10, 2020

what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help

now even debugfs's reg_op can used to dump wave.
tested-by: NMonk Liu <monk.liu@amd.com>
tested-by: NZhou pengju <pengju.zhou@amd.com>
Signed-off-by: NZhou pengju <pengju.zhou@amd.com>
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2e0cc4d4

13 3月, 2020 2 次提交

drm/amdgpu: add fbdev suspend/resume on gpu reset · 565d1941

由 Evan Quan 提交于 3月 11, 2020

This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

565d1941

drm/amdgpu: add fbdev suspend/resume on gpu reset · 063e768e

由 Evan Quan 提交于 3月 11, 2020

This can fix the baco reset failure seen on Navi10.
And this should be a low risk fix as the same sequence
is already used for system suspend/resume.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

063e768e

05 3月, 2020 1 次提交

drm/amdgpu: fix IB test MCBP bug · 752c683d

由 Monk Liu 提交于 2月 20, 2020

1)for gfx IB test we shouldn't insert DE meta data

2)we should make sure IB test finished before we
send event 3 to hypervisor otherwise the IDLE from
event 3 will preempt IB test, which is not designed
as a compatible structure for MCBP
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

752c683d

29 2月, 2020 1 次提交

drm/amdgpu: no need to clean debugfs at amdgpu · d2790e10

由 Yintian Tao 提交于 2月 27, 2020

drm_minor_unregister will invoke drm_debugfs_cleanup
to clean all the child node under primary minor node.
We don't need to invoke amdgpu_debugfs_fini and
amdgpu_debugfs_regs_cleanup to clean agian.
Otherwise, it will raise the NULL pointer like below.
[   45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[   45.047256] PGD 0 P4D 0
[   45.047713] Oops: 0002 [#1] SMP PTI
[   45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G        W  OE     4.18.0-15-generic #16~18.04.1-Ubuntu
[   45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[   45.050651] RIP: 0010:down_write+0x1f/0x40
[   45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01
[   45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246
[   45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814
[   45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8
[   45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00
[   45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300
[   45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860
[   45.059221] FS:  00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000
[   45.060809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0
[   45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   45.065897] Call Trace:
[   45.066426]  debugfs_remove+0x36/0xa0
[   45.067131]  amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu]
[   45.068019]  amdgpu_debugfs_fini+0x2c/0x50 [amdgpu]
[   45.068756]  amdgpu_pci_remove+0x49/0x70 [amdgpu]
[   45.069439]  pci_device_remove+0x3e/0xc0
[   45.070037]  device_release_driver_internal+0x18a/0x260
[   45.070842]  driver_detach+0x3f/0x80
[   45.071325]  bus_remove_driver+0x59/0xd0
[   45.071850]  driver_unregister+0x2c/0x40
[   45.072377]  pci_unregister_driver+0x22/0xa0
[   45.073043]  amdgpu_exit+0x15/0x57c [amdgpu]
[   45.073683]  __x64_sys_delete_module+0x146/0x280
[   45.074369]  do_syscall_64+0x5a/0x120
[   45.074916]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

v2: remove all debugfs cleanup/fini code at amdgpu
v3: squash in unused variable removal
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d2790e10

27 2月, 2020 6 次提交

drm/amdgpu: drop legacy drm load and unload callbacks · c6385e50