提交 · e09d40bdbac0e37a0179f4cd901e6422619a7ad2 · openeuler / Kernel

23 4月, 2020 22 次提交

drm/amdgpu: change how we update mmRLC_SPM_MC_CNTL · e09d40bd

由 Christian König 提交于 4月 21, 2020

In pp_one_vf mode avoid the extra overhead and read/write the
registers without the KIQ.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Acked-by: NYintian Tao <yintian.tao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e09d40bd

drm/amdgpu: set error query ready after all IPs late init · a891d239

由 Dennis Li 提交于 4月 22, 2020

If set error query ready in amdgpu_ras_late_init, which will
cause some IP blocks aren't initialized, but their error query
is ready.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a891d239

drm/amdgpu: code cleanup around gpu reset · 7dd8c205

由 Evan Quan 提交于 4月 16, 2020

Make code more readable.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7dd8c205

drm/amdgpu: optimize the gpu reset for XGMI setup V2 · 9e94d22c

由 Evan Quan 提交于 4月 16, 2020

This is basically just some code cosmetic. The current design
for XGMI setup gput reset is to operate on current device(adev)
first and then on other devices from the hive(by another 'for' loop).
But actually we can do some sort to the device list(to put current
device 1st position) and handle all the devices in a single 'for'
loop.

V2: added missing hive->hive_lock protection
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9e94d22c

drm/amdgpu: correct cancel_delayed_work_sync on gpu reset · 52fb44cf

由 Evan Quan 提交于 4月 16, 2020

As for XGMI setup, it should be performed on other devices
from the hive also.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52fb44cf

drm/amdgpu: correct fbdev suspend on gpu reset · a2f63ee8

由 Evan Quan 提交于 4月 16, 2020

As for XGMI setup, it needs to be performed on
all the devices from the same hive.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2f63ee8

drm/amdgpu: cleanup coding style in amdkfd a bit · 10f39758

由 Bernard Zhao 提交于 4月 21, 2020

Make the code a bit more readable by using a common
error handling pattern.
Signed-off-by: NBernard Zhao <bernard@vivo.com>
Reviewed-by: Christian König <christian.koenig@amd.com>.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

10f39758

drm/amdgpu: clean up unused variable about ring lru · e05185b3

由 Kevin Wang 提交于 4月 20, 2020

clean up unused variable:
1. ring_lru_list
2. ring_lru_list_lock

related-commit:
drm/amdgpu: remove ring lru handling
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e05185b3

drm/amdgpu: replace DRM prefix with PCI device info for gfx/mmhub · 4cc1178e

由 Dennis Li 提交于 4月 18, 2020

Prefix RAS message printing in gfx/mmhub with PCI device info,
which assists the debug in multiple GPU case.
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4cc1178e

drm/amdgpu: disble vblank when unloading sriov driver · 7aba1918

由 Jiawei 提交于 4月 17, 2020

disble vblank in dce_vitual_crtc_commit(), which is skipped
under sriov before
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NJiawei <Jiawei.Gu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7aba1918

drm/amdgpu: Print CU information by default during initialization · d69b8971

由 Yong Zhao 提交于 4月 17, 2020

This is convenient for multiple teams to obtain the information. Also,
add device info by using dev_info().
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d69b8971

drm/amdgpu: Adjust the SDMA doorbell info printing · e1046a1f

由 Yong Zhao 提交于 4月 17, 2020

Turn off the printing by default because it is not very useful, while
adding more details.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e1046a1f

drm/amdgpu: fix race between pstate and remote buffer map · d84a430d

由 Jonathan Kim 提交于 3月 17, 2020

Vega20 arbitrates pstate at hive level and not device level. Last peer to
remote buffer unmap could drop P-State while another process is still
remote buffer mapped.

With this fix, P-States still needs to be disabled for now as SMU bug
was discovered on synchronous P2P transfers.  This should be fixed in the
next FW update.
Signed-off-by: NJonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d84a430d

Revert "drm/amdgpu: Disable gfx off if VCN is busy" · 4f610503

由 James Zhu 提交于 4月 11, 2020

This reverts commit 3fded222
This is work around for vcn1 only. Currently vcn1 has separate
begin_use and idle work handle.
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Tested-by: Nchangzhu <Changfeng.Zhu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4f610503

drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU · 12c17b9d

由 Guchun Chen 提交于 4月 16, 2020

When running ras uncorrectable error injection and triggering GPU
reset on sGPU, below issue is observed. It's caused by the list
uninitialized when accessing.

[   80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750
[   80.047300] #PF: supervisor write access in kernel mode
[   80.047351] #PF: error_code(0x0003) - permissions violation
[   80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061
[   80.047477] Oops: 0003 [#1] SMP PTI
[   80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G           OE     5.4.0-rc7-guchchen #1
[   80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
[   80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

12c17b9d

drm/amdgpu: Disable FRU read on Arcturus · 69d0c18d

由 Kent Russell 提交于 4月 16, 2020

Update the list with supported Arcturus chips, but disable for now until
final list is confirmed.

Ideally we can poll atombios for FRU support, instead of maintaining
this list of chips, but this will enable serial number reading for
supported ASICs for the time-being.
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

69d0c18d

drm/amdgpu/gmc: Fix spelling mistake. · 53c9c89a

由 Rajneesh Bhardwaj 提交于 4月 05, 2020

Fixes a minor typo in the file.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53c9c89a

Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2" · fdd21e62

由 Kent Russell 提交于 4月 13, 2020

This reverts commit c12b84d6.

The original patch causes a RAS event and subsequent kernel hard-hang
when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
Arcturus

dmesg output at hang time:
[drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
amdgpu 0000:67:00.0: GPU reset begin!
Evicting PASID 0x8000 queues
Started evicting pasid 0x8000
qcm fence wait loop timeout expired
The cp might be in an unrecoverable state due to an unsuccessful queues preemption
Failed to evict process queues
Failed to suspend process 0x8000
Finished evicting pasid 0x8000
Started restoring pasid 0x8000
Finished restoring pasid 0x8000
[drm] UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT
amdgpu: [powerplay] Failed to send message 0x26, response 0x0
amdgpu: [powerplay] Failed to set soft min gfxclk !
amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
amdgpu: [powerplay] Failed to send message 0x7, response 0x0
amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu features!
amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -5
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fdd21e62

drm/amdgpu/gfx9: add gfxoff quirk · 079c72ad

由 Alex Deucher 提交于 4月 09, 2020

Fix screen corruption with firefox.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

079c72ad

drm/amdgpu: set mp1 state before reload · 7f70443f

由 John Clements 提交于 4月 14, 2020

Set MP1 state to prepare for unload before reloading SMU FW
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7f70443f

drm/amdgpu: update psp fw loading sequence · 40e611bd

由 John Clements 提交于 4月 14, 2020

Added dedicated function to check if particular fw should be skipped from loading.

Added dedicated function for SMU FW loading via PSP
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

40e611bd

drm/amdgpu: fix the hw hang during perform system reboot and reset · ced1ba97

由 Prike Liang 提交于 4月 13, 2020

The system reboot failed as some IP blocks enter power gate before perform
hw resource destory. Meanwhile use unify interface to set device CGPG to ungate
state can simplify the amdgpu poweroff or reset ungate guard.

Fixes: 487eca11 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Tested-by: NMengbing Wang <Mengbing.Wang@amd.com>
Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ced1ba97

14 4月, 2020 13 次提交

drm/amdgpu: remove dead code in si_dpm.c · 8e2f8420

由 Jason Yan 提交于 4月 13, 2020

This code is dead, let's remove it.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8e2f8420

drm/amd/amdgpu: remove hardcoded module name in prints · dd4fa6c1

由 Aurabindo Pillai 提交于 4月 08, 2020

Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd4fa6c1

drm/amd/amdgpu: add print prefix for dev_* variants · 539489fc

由 Aurabindo Pillai 提交于 4月 08, 2020

Define dev_fmt macro for informative print messages
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

539489fc

drm/amd/amdgpu: add prefix for pr_* prints · d57229b1

由 Aurabindo Pillai 提交于 4月 08, 2020

amdgpu uses lots of pr_* calls for printing error messages.
With this prefix, errors shall be more obvious to the end
use regarding its origin, and may help debugging.

Prefix format:

[xxx.xxxxx] amdgpu: ...
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d57229b1

drm/amdgpu/ring: simplify scheduler setup logic · a4c24680

由 Alex Deucher 提交于 4月 09, 2020

Set up a GPU scheduler based on the ring flag rather
than the ring type.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4c24680

drm/amdgpu/kiq: add no_scheduler flag to KIQ · a783910d

由 Alex Deucher 提交于 4月 09, 2020

We don't want a GPU scheduler for this ring.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a783910d

drm/amdgpu/ring: add no_scheduler flag · cb3d1085

由 Alex Deucher 提交于 4月 09, 2020

This allows IPs to flag whether a specific ring requires
a GPU scheduler or not.  E.g., sometimes instances of an
IP are asymmetric and have different capabilities.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cb3d1085

drm/amdgpu: fix wrong vram lost counter increment V2 · dadce777

由 Evan Quan 提交于 4月 10, 2020

Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dadce777

drm/amdgpu: replace DRM prefix with PCI device info for GFX RAS · ed72aa21

由 Guchun Chen 提交于 4月 13, 2020

Prefix RAS message printing in GFX IP with PCI device info,
which assists the debug in multiple GPU case.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ed72aa21

drm/amdgpu: resume kiq access debugfs · d32709da

由 Yintian Tao 提交于 4月 13, 2020

If there is no GPU hang, user still can access
debugfs through kiq.
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NMonk Liu <Monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d32709da

drm/amdgpu: refine ras related message print · 6952e99c

由 Guchun Chen 提交于 4月 10, 2020

Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.
Suggested-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6952e99c

drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb · 1f3ef0ef

由 Guchun Chen 提交于 4月 10, 2020

Uncorrectable error count printing is missed when issuing UMC
UE injection. When going to the error count log function in GPU
recover work thread, there is no chance to get correct error count
value by last error injection and print, because the error status
register is automatically cleared after reading in UMC ecc irq
callback. So add such message printing in UMC ecc irq cb to be
consistent with other RAS error interrupt cases.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1f3ef0ef

drm/amdgpu: restrict debugfs register access under SR-IOV · 95a2f917

由 Yintian Tao 提交于 4月 07, 2020

Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
    amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
    and directly return result
Signed-off-by: NYintian Tao <yttao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95a2f917

09 4月, 2020 5 次提交

drm/amdgpu: increased atom cmd timeout · 9a785c7a

由 John Clements 提交于 4月 09, 2020

added macro to define timeout
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9a785c7a

amdgpu_kms: Remove unnecessary condition check · ad36d71b

由 Aurabindo Pillai 提交于 4月 07, 2020

Execution will only reach here if the asserted condition is true.
Hence there is no need for the additional check.
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ad36d71b

drm/amdgpu: unify fw_write_wait for new gfx9 asics · ba714a56

由 Aaron Liu 提交于 4月 07, 2020

Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Tested-by: NAaron Liu <aaron.liu@amd.com>
Tested-by: NYuxian Dai <Yuxian.Dai@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ba714a56

drm/amdgpu: support access regs outside of mmio bar · 2eee0229

由 Hawking Zhang 提交于 4月 08, 2020

add indirect access support to registers outside of
mmio bar.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2eee0229

drm/amdgpu: retire AMDGPU_REGS_KIQ flag · f384ff95

由 Hawking Zhang 提交于 4月 03, 2020

all the register access through kiq is redirected
to amdgpu_kiq_rreg/amdgpu_kiq_wreg
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f384ff95

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功