提交 · a2676149323f04bf229bdad7f74b7ad14edd54d3 · openeuler / Kernel

24 4月, 2020 6 次提交

drm/amdgpu: retire support_vmr_ring interface · a2676149

由 Hawking Zhang 提交于 4月 20, 2020

vmr ring is dedicated for sriov vf (i.e.guest driver
in sriov), which is general communication interface
between driver and psp fw accross all ip version.
it is not correct to make it as ip specific callback.
it is even worse to check specific tOS version per IP
version (like psp_v11/v12).
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2676149

drm/amdgpu: shrink critical section in amdgpu_amdkfd_gpuvm_free_memory_of_gpu · fe158997

由 Bernard Zhao 提交于 4月 20, 2020

Reduce the mem->lock`s protected code area, no need to protect pr_debug.
This also simplifies error handling.
Signed-off-by: NBernard Zhao <bernard@vivo.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fe158997

drm/amdgpu: Init data to avoid oops while reading pp_num_states. · 6f81b2d0

由 limingyu 提交于 4月 22, 2020

For chip like CHIP_OLAND with si enabled(amdgpu.si_support=1),
the amdgpu will expose pp_num_states to the /sys directory.
In this moment, read the pp_num_states file will excute the
amdgpu_get_pp_num_states func. In our case, the data hasn't
been initialized, so the kernel will access some ilegal
address, trigger the segmentfault and system will reboot soon:

    uos@uos-PC:~$ cat /sys/devices/pci0000\:00/0000\:00\:00.0/0000\:01\:00
    .0/pp_num_states

    Message from syslogd@uos-PC at Apr 22 09:26:20 ...
     kernel:[   82.154129] Internal error: Oops: 96000004 [#1] SMP

This patch aims to fix this problem, avoid that reading file
triggers the kernel sementfault.
Signed-off-by: Nlimingyu <limingyu@uniontech.com>
Signed-off-by: Nzhoubinbin <zhoubinbin@uniontech.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6f81b2d0

drm/amdgpu: remove set but not used variable 'priority' · 00aba6da

由 YueHaibing 提交于 4月 21, 2020

drivers/gpu/drm/amd/amdgpu/amdgpu_job.c: In function amdgpu_job_submit:
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c:148:26: warning: variable priority set but not used [-Wunused-but-set-variable]

commit 33abcb1f ("drm/amdgpu: set compute queue priority at mqd_init")
left behind this, remove it.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

00aba6da

drm: amdgpu: fix kernel-doc struct warning · 408d9121

由 Randy Dunlap 提交于 4月 19, 2020

Fix a kernel-doc warning of missing struct field desription:

../drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:92: warning: Function parameter or member 'vm' not described in 'amdgpu_vm_eviction_lock'

Fixes: a269e449 ("drm/amdgpu: Avoid reclaim fs while eviction lock")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Reviewed-by: NHarry Wentland <harry.wentland@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

408d9121

drm/amdgpu: request reg_val_offs each kiq read reg · 54208194

由 Yintian Tao 提交于 4月 22, 2020

According to the current kiq read register method,
there will be race condition when using KIQ to read
register if multiple clients want to read at same time
just like the expample below:
1. client-A start to read REG-0 throguh KIQ
2. client-A poll the seqno-0
3. client-B start to read REG-1 through KIQ
4. client-B poll the seqno-1
5. the kiq complete these two read operation
6. client-A to read the register at the wb buffer and
   get REG-1 value

Therefore, use amdgpu_device_wb_get() to request reg_val_offs
for each kiq read register.

v2: fix the error remove
v3: fix the print typo
v4: remove unused variables
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

54208194

23 4月, 2020 22 次提交

drm/amdgpu: change how we update mmRLC_SPM_MC_CNTL · e09d40bd

由 Christian König 提交于 4月 21, 2020

In pp_one_vf mode avoid the extra overhead and read/write the
registers without the KIQ.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Acked-by: NYintian Tao <yintian.tao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e09d40bd

drm/amdgpu: set error query ready after all IPs late init · a891d239

由 Dennis Li 提交于 4月 22, 2020

If set error query ready in amdgpu_ras_late_init, which will
cause some IP blocks aren't initialized, but their error query
is ready.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a891d239

drm/amdgpu: code cleanup around gpu reset · 7dd8c205

由 Evan Quan 提交于 4月 16, 2020

Make code more readable.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7dd8c205

drm/amdgpu: optimize the gpu reset for XGMI setup V2 · 9e94d22c

由 Evan Quan 提交于 4月 16, 2020

This is basically just some code cosmetic. The current design
for XGMI setup gput reset is to operate on current device(adev)
first and then on other devices from the hive(by another 'for' loop).
But actually we can do some sort to the device list(to put current
device 1st position) and handle all the devices in a single 'for'
loop.

V2: added missing hive->hive_lock protection
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9e94d22c

drm/amdgpu: correct cancel_delayed_work_sync on gpu reset · 52fb44cf

由 Evan Quan 提交于 4月 16, 2020

As for XGMI setup, it should be performed on other devices
from the hive also.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52fb44cf

drm/amdgpu: correct fbdev suspend on gpu reset · a2f63ee8

由 Evan Quan 提交于 4月 16, 2020

As for XGMI setup, it needs to be performed on
all the devices from the same hive.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2f63ee8

drm/amdgpu: cleanup coding style in amdkfd a bit · 10f39758

由 Bernard Zhao 提交于 4月 21, 2020

Make the code a bit more readable by using a common
error handling pattern.
Signed-off-by: NBernard Zhao <bernard@vivo.com>
Reviewed-by: Christian König <christian.koenig@amd.com>.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

10f39758

drm/amdgpu: clean up unused variable about ring lru · e05185b3

由 Kevin Wang 提交于 4月 20, 2020

clean up unused variable:
1. ring_lru_list
2. ring_lru_list_lock

related-commit:
drm/amdgpu: remove ring lru handling
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e05185b3

drm/amdgpu: replace DRM prefix with PCI device info for gfx/mmhub · 4cc1178e

由 Dennis Li 提交于 4月 18, 2020

Prefix RAS message printing in gfx/mmhub with PCI device info,
which assists the debug in multiple GPU case.
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4cc1178e

drm/amdgpu: disble vblank when unloading sriov driver · 7aba1918

由 Jiawei 提交于 4月 17, 2020

disble vblank in dce_vitual_crtc_commit(), which is skipped
under sriov before
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NJiawei <Jiawei.Gu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7aba1918

drm/amdgpu: Print CU information by default during initialization · d69b8971

由 Yong Zhao 提交于 4月 17, 2020

This is convenient for multiple teams to obtain the information. Also,
add device info by using dev_info().
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d69b8971

drm/amdgpu: Adjust the SDMA doorbell info printing · e1046a1f

由 Yong Zhao 提交于 4月 17, 2020

Turn off the printing by default because it is not very useful, while
adding more details.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e1046a1f

drm/amdgpu: fix race between pstate and remote buffer map · d84a430d

由 Jonathan Kim 提交于 3月 17, 2020

Vega20 arbitrates pstate at hive level and not device level. Last peer to
remote buffer unmap could drop P-State while another process is still
remote buffer mapped.

With this fix, P-States still needs to be disabled for now as SMU bug
was discovered on synchronous P2P transfers.  This should be fixed in the
next FW update.
Signed-off-by: NJonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d84a430d

Revert "drm/amdgpu: Disable gfx off if VCN is busy" · 4f610503

由 James Zhu 提交于 4月 11, 2020

This reverts commit 3fded222
This is work around for vcn1 only. Currently vcn1 has separate
begin_use and idle work handle.
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Tested-by: Nchangzhu <Changfeng.Zhu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4f610503

drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU · 12c17b9d

由 Guchun Chen 提交于 4月 16, 2020

When running ras uncorrectable error injection and triggering GPU
reset on sGPU, below issue is observed. It's caused by the list
uninitialized when accessing.

[   80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750
[   80.047300] #PF: supervisor write access in kernel mode
[   80.047351] #PF: error_code(0x0003) - permissions violation
[   80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061
[   80.047477] Oops: 0003 [#1] SMP PTI
[   80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G           OE     5.4.0-rc7-guchchen #1
[   80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
[   80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

12c17b9d

drm/amdgpu: Disable FRU read on Arcturus · 69d0c18d

由 Kent Russell 提交于 4月 16, 2020

Update the list with supported Arcturus chips, but disable for now until
final list is confirmed.

Ideally we can poll atombios for FRU support, instead of maintaining
this list of chips, but this will enable serial number reading for
supported ASICs for the time-being.
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

69d0c18d

drm/amdgpu/gmc: Fix spelling mistake. · 53c9c89a

由 Rajneesh Bhardwaj 提交于 4月 05, 2020

Fixes a minor typo in the file.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53c9c89a

Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2" · fdd21e62

由 Kent Russell 提交于 4月 13, 2020

This reverts commit c12b84d6.

The original patch causes a RAS event and subsequent kernel hard-hang
when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
Arcturus

dmesg output at hang time:
[drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
amdgpu 0000:67:00.0: GPU reset begin!
Evicting PASID 0x8000 queues
Started evicting pasid 0x8000
qcm fence wait loop timeout expired
The cp might be in an unrecoverable state due to an unsuccessful queues preemption
Failed to evict process queues
Failed to suspend process 0x8000
Finished evicting pasid 0x8000
Started restoring pasid 0x8000
Finished restoring pasid 0x8000
[drm] UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT
amdgpu: [powerplay] Failed to send message 0x26, response 0x0
amdgpu: [powerplay] Failed to set soft min gfxclk !
amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
amdgpu: [powerplay] Failed to send message 0x7, response 0x0
amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu features!
amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
[drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -5
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fdd21e62

drm/amdgpu/gfx9: add gfxoff quirk · 079c72ad

由 Alex Deucher 提交于 4月 09, 2020

Fix screen corruption with firefox.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

079c72ad

drm/amdgpu: set mp1 state before reload · 7f70443f

由 John Clements 提交于 4月 14, 2020

Set MP1 state to prepare for unload before reloading SMU FW
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7f70443f

drm/amdgpu: update psp fw loading sequence · 40e611bd

由 John Clements 提交于 4月 14, 2020

Added dedicated function to check if particular fw should be skipped from loading.

Added dedicated function for SMU FW loading via PSP
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

40e611bd

drm/amdgpu: fix the hw hang during perform system reboot and reset · ced1ba97

由 Prike Liang 提交于 4月 13, 2020

The system reboot failed as some IP blocks enter power gate before perform
hw resource destory. Meanwhile use unify interface to set device CGPG to ungate
state can simplify the amdgpu poweroff or reset ungate guard.

Fixes: 487eca11 ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)")
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Tested-by: NMengbing Wang <Mengbing.Wang@amd.com>
Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ced1ba97

14 4月, 2020 12 次提交

drm/amdgpu: remove dead code in si_dpm.c · 8e2f8420

由 Jason Yan 提交于 4月 13, 2020

This code is dead, let's remove it.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8e2f8420

drm/amd/amdgpu: remove hardcoded module name in prints · dd4fa6c1

由 Aurabindo Pillai 提交于 4月 08, 2020

Let format prefixes take care of printing the module name
through pr_fmt and dev_fmt definitions.
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd4fa6c1

drm/amd/amdgpu: add print prefix for dev_* variants · 539489fc

由 Aurabindo Pillai 提交于 4月 08, 2020

Define dev_fmt macro for informative print messages
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

539489fc

drm/amd/amdgpu: add prefix for pr_* prints · d57229b1

由 Aurabindo Pillai 提交于 4月 08, 2020

amdgpu uses lots of pr_* calls for printing error messages.
With this prefix, errors shall be more obvious to the end
use regarding its origin, and may help debugging.

Prefix format:

[xxx.xxxxx] amdgpu: ...
Signed-off-by: NAurabindo Pillai <mail@aurabindo.in>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d57229b1

drm/amdgpu/ring: simplify scheduler setup logic · a4c24680

由 Alex Deucher 提交于 4月 09, 2020

Set up a GPU scheduler based on the ring flag rather
than the ring type.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4c24680

drm/amdgpu/kiq: add no_scheduler flag to KIQ · a783910d

由 Alex Deucher 提交于 4月 09, 2020

We don't want a GPU scheduler for this ring.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a783910d

drm/amdgpu/ring: add no_scheduler flag · cb3d1085

由 Alex Deucher 提交于 4月 09, 2020

This allows IPs to flag whether a specific ring requires
a GPU scheduler or not.  E.g., sometimes instances of an
IP are asymmetric and have different capabilities.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cb3d1085

drm/amdgpu: fix wrong vram lost counter increment V2 · dadce777

由 Evan Quan 提交于 4月 10, 2020

Vram lost counter is wrongly increased by two during baco reset.

V2: assumed vram lost for mode1 reset on all ASICs
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dadce777

drm/amdgpu: replace DRM prefix with PCI device info for GFX RAS · ed72aa21

由 Guchun Chen 提交于 4月 13, 2020

Prefix RAS message printing in GFX IP with PCI device info,
which assists the debug in multiple GPU case.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ed72aa21

drm/amdgpu: resume kiq access debugfs · d32709da

由 Yintian Tao 提交于 4月 13, 2020

If there is no GPU hang, user still can access
debugfs through kiq.
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NMonk Liu <Monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d32709da

drm/amdgpu: refine ras related message print · 6952e99c

由 Guchun Chen 提交于 4月 10, 2020

Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.
Suggested-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6952e99c

drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb · 1f3ef0ef

由 Guchun Chen 提交于 4月 10, 2020

Uncorrectable error count printing is missed when issuing UMC
UE injection. When going to the error count log function in GPU
recover work thread, there is no chance to get correct error count
value by last error injection and print, because the error status
register is automatically cleared after reading in UMC ecc irq
callback. So add such message printing in UMC ecc irq cb to be
consistent with other RAS error interrupt cases.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1f3ef0ef

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功