提交 · 4c681812443b2800364096d52326721d2b842f5d · openeuler / Kernel

14 1月, 2022 2 次提交

drm/amdgpu: init iommu after amdkfd device init · 4c681812

由 Yifan Zhang 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.85
commit 7508a9aa65b959bbc6d9e42c9683520bddb7db0d
bugzilla: 186032 https://gitee.com/openeuler/kernel/issues/I4QVI4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7508a9aa65b959bbc6d9e42c9683520bddb7db0d

--------------------------------

commit 714d9e45 upstream.

This patch is to fix clinfo failure in Raven/Picasso:

Number of platforms: 1
  Platform Profile: FULL_PROFILE
  Platform Version: OpenCL 2.2 AMD-APP (3364.0)
  Platform Name: AMD Accelerated Parallel Processing
  Platform Vendor: Advanced Micro Devices, Inc.
  Platform Extensions: cl_khr_icd cl_amd_event_callback

  Platform Name: AMD Accelerated Parallel Processing Number of devices: 0
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NJames Zhu <James.Zhu@amd.com>
Tested-by: NJames Zhu <James.Zhu@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4c681812

drm/amdgpu: move iommu_resume before ip init/resume · 143e8fb5

由 James Zhu 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.85
commit ac9db04ee32f007e48cb0763784ccfadd5a21342
bugzilla: 186032 https://gitee.com/openeuler/kernel/issues/I4QVI4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ac9db04ee32f007e48cb0763784ccfadd5a21342

--------------------------------

commit f02abeb0 upstream.

Separate iommu_resume from kfd_resume, and move it before
other amdgpu ip init/resume.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

143e8fb5

19 10月, 2021 1 次提交

drm/amdgpu: Cancel delayed work when GFXOFF is disabled · 147b0a1e

由 Michel Dänzer 提交于 10月 19, 2021

mainline inclusion
from mainline-5.10.62
commit da3067eadcc156b742657c0694beae0a7c49d157
bugzilla: 182217 https://gitee.com/openeuler/kernel/issues/I4EFOS

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da3067eadcc156b742657c0694beae0a7c49d157

--------------------------------

commit 32bc8f83 upstream.

schedule_delayed_work does not push back the work if it was already
scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
was disabled and re-enabled again during those 100 ms.

This resulted in frame drops / stutter with the upcoming mutter 41
release on Navi 14, due to constantly enabling GFXOFF in the HW and
disabling it again (for getting the GPU clock counter).

To fix this, call cancel_delayed_work_sync when the disable count
transitions from 0 to 1, and only schedule the delayed work on the
reverse transition, not if the disable count was already 0. This makes
sure the delayed work doesn't run at unexpected times, and allows it to
be lock-free.

v2:
* Use cancel_delayed_work_sync & mutex_trylock instead of
  mod_delayed_work.
v3:
* Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
v4:
* Fix race condition between amdgpu_gfx_off_ctrl incrementing
  adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
  checking for it to be 0 (Evan Quan)

Cc: stable@vger.kernel.org
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> # v3
Acked-by: Christian König <christian.koenig@amd.com> # v3
Signed-off-by: NMichel Dänzer <mdaenzer@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

147b0a1e

15 10月, 2021 2 次提交

drm/amdgpu: Fix resource leak on probe error path · 9f5b8c1b

由 Jiri Kosina 提交于 10月 15, 2021

stable inclusion
from stable-5.10.56
commit fc2756cce06f9833ebabd309b5b5080ed5c56897
bugzilla: 176004 https://gitee.com/openeuler/kernel/issues/I4DYZ4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fc2756cce06f9833ebabd309b5b5080ed5c56897

--------------------------------

commit d47255d3 upstream.

This reverts commit 4192f7b5.

It is not true (as stated in the reverted commit changelog) that we never
unmap the BAR on failure; it actually does happen properly on
amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
amdgpu_device_fini() error path.

What's worse, this commit actually completely breaks resource freeing on
probe failure (like e.g. failure to load microcode), as
amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
early, leaving all the resources that'd normally be freed in
amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
to all sorts of oopses when someone tries to, for example, access the
sysfs and procfs resources which are still around while the driver is
gone.

Fixes: 4192f7b5 ("drm/amdgpu: unmap register bar on device init failure")
Reported-by: NVojtech Pavlik <vojtech@ucw.cz>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9f5b8c1b

drm/amd/amdgpu/sriov disable all ip hw status by default · c4a4cc9b

由 Jack Zhang 提交于 10月 14, 2021

stable inclusion
from stable-5.10.51
commit b025bc07c94770ab5ca68a8b2ead12628c2a0698
bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b025bc07c94770ab5ca68a8b2ead12628c2a0698

--------------------------------

[ Upstream commit 95ea3dbc ]

Disable all ip's hw status to false before any hw_init.
Only set it to true until its hw_init is executed.

The old 5.9 branch has this change but somehow the 5.11 kernrel does
not have this fix.

Without this change, sriov tdr have gfx IB test fail.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Review-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c4a4cc9b

15 6月, 2021 1 次提交

drm/amd/amdgpu: fix a potential deadlock in gpu reset · b65aa179

由 Lang Yu 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit 4951dd498d483fa961c92541b55ffb32db7f2dbf
bugzilla: 55093
CVE: NA

--------------------------------

[ Upstream commit 9c2876d5 ]

When amdgpu_ib_ring_tests failed, the reset logic called
amdgpu_device_ip_suspend twice, then deadlock occurred.
Deadlock log:

[  805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[  806.290952] [drm] free PSP TMR buffer

[  806.319406] ============================================
[  806.320315] WARNING: possible recursive locking detected
[  806.321225] 5.11.0-custom #1 Tainted: G        W  OEL
[  806.322135] --------------------------------------------
[  806.323043] cat/2593 is trying to acquire lock:
[  806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.325668]
               but task is already holding lock:
[  806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.328430]
               other info that might help us debug this:
[  806.329539]  Possible unsafe locking scenario:

[  806.330549]        CPU0
[  806.330983]        ----
[  806.331416]   lock(&adev->dm.dc_lock);
[  806.332086]   lock(&adev->dm.dc_lock);
[  806.332738]
                *** DEADLOCK ***

[  806.333747]  May be due to missing lock nesting notation

[  806.334899] 3 locks held by cat/2593:
[  806.335537]  #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
[  806.337009]  #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  806.339018]  #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.340869]
               stack backtrace:
[  806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G        W  OEL    5.11.0-custom #1
[  806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
[  806.344413] Call Trace:
[  806.344849]  dump_stack+0x93/0xbd
[  806.345435]  __lock_acquire.cold+0x18a/0x2cf
[  806.346179]  lock_acquire+0xca/0x390
[  806.346807]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.347813]  __mutex_lock+0x9b/0x930
[  806.348454]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.349434]  ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
[  806.350581]  ? _raw_spin_unlock_irqrestore+0x47/0x50
[  806.351437]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.352437]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.353252]  ? rcu_read_lock_sched_held+0x4f/0x80
[  806.354064]  mutex_lock_nested+0x1b/0x20
[  806.354747]  ? mutex_lock_nested+0x1b/0x20
[  806.355457]  dm_suspend+0xb8/0x1d0 [amdgpu]
[  806.356427]  ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
[  806.357736]  amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
[  806.360394]  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  806.362926]  amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
[  806.365560]  amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]
Signed-off-by: NLang Yu <Lang.Yu@amd.com>
Acked-by: NChristian KÃnig <christian.koenig@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b65aa179

03 6月, 2021 1 次提交

drm/amdgpu: Fix some unload driver issues · 246469c5

由 Emily Deng 提交于 5月 22, 2021

stable inclusion
from stable-5.10.36
commit b168fffa3821809d3166e97b055c719f82aa515f
bugzilla: 51867
CVE: NA

--------------------------------

[ Upstream commit bb0cd09b ]

When unloading driver after killing some applications, it will hit sdma
flush tlb job timeout which is called by ttm_bo_delay_delete. So
to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue
before fence driver fini. And also put drm_sched_fini before waiting fence.
Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

246469c5

08 2月, 2021 1 次提交

drm/amdgpu: remove gpu info firmware of green sardine · 179de409

由 Huang Rui 提交于 1月 28, 2021

stable inclusion
from stable-5.10.11
commit 09846950a1b63a91235d2b4b260afa04280d5388
bugzilla: 47621

--------------------------------

commit acc214bf upstream.

The ip discovery is supported on green sardine, it doesn't need gpu info
firmware anymore.
Signed-off-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NPrike Liang <Prike.Liang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.10.x
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

179de409

28 1月, 2021 2 次提交

drm/amdgpu: fix a GPU hang issue when remove device · c606ecc8

由 Dennis Li 提交于 1月 25, 2021

stable inclusion
from stable-5.10.9
commit a973bc7d8ab521e35846c793476027b11bbe5a3f
bugzilla: 47457

--------------------------------

[ Upstream commit 88e21af1 ]

When GFXOFF is enabled and GPU is idle, driver will fail to access some
registers. Therefore change to disable power gating before all access
registers with MMIO.

Dmesg log is as following:
amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

c606ecc8

drm/amdgpu: fix DRM_INFO flood if display core is not supported (bug 210921) · b3a5938f

由 Alexandre Demers 提交于 1月 25, 2021

stable inclusion
from stable-5.10.9
commit 7fe745881255136e3fd3c96236fdddb4e6fc0642
bugzilla: 47457

--------------------------------

commit ff9346db upstream.

This fix bug 210921 where DRM_INFO floods log when hitting an unsupported ASIC in
amdgpu_device_asic_has_dc_support(). This info should be only called once.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=210921Signed-off-by: NAlexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b3a5938f

25 11月, 2020 1 次提交

drm/amd/amdgpu: fix null pointer in runtime pm · 7acc79eb

由 Kenneth Feng 提交于 11月 17, 2020

fix the null pointer issue when runtime pm is triggered.
Signed-off-by: NKenneth Feng <kenneth.feng@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

7acc79eb

04 11月, 2020 1 次提交

drm/amdgpu: add green_sardine support for gpu_info and ip block setting (v2) · c38577a4

由 Prike Liang 提交于 10月 01, 2020

This patch adds green_sardine support for gpu_info firmware and ip block setting.

v2: use apu flag
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c38577a4

29 10月, 2020 2 次提交

amdgpu: fix a few kernel-doc markup issues · b28d70c6

由 Mauro Carvalho Chehab 提交于 10月 27, 2020

A kernel-doc markup can't be mixed with a random comment,
as it causes parsing problems.

While here, change an invalid kernel-doc markup into
a common comment.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/e899f50404e94ac9a7c3267dd34f951c1a44fb2b.1603791716.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

b28d70c6

drm: amdgpu: kernel-doc: update some adev parameters · ca766ff0

由 Mauro Carvalho Chehab 提交于 10月 27, 2020

Running "make htmldocs: produce lots of warnings on those files:
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'

They're related to the repacement of some parameters by adev,
and due to a few renamed parameters.

While here, uniform the name of the parameter for it to be
the same on all functions using a pointer to struct amdgpu_device.

Update the kernel-doc documentation accordingly.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/5755c2b361890b8ae5cea0f61dfd70b1c135eefe.1603791716.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>

ca766ff0

22 10月, 2020 1 次提交

drm/amdgpu: correct the gpu reset handling for job != NULL case · 207ac684

由 Evan Quan 提交于 10月 15, 2020

Current code wrongly treat all cases as job == NULL.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-and-tested-by: NJane Jian <Jane.Jian@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

207ac684

10 10月, 2020 1 次提交

drm/amdgpu: prevent spurious warning · 9142c413

由 Alex Deucher 提交于 10月 06, 2020

The default auto setting for kcq should not generate
a warning.

Fixes: a300de40 ("drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5)")
Reviewed-by: NKent Russell <kent.russell@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9142c413

01 10月, 2020 2 次提交

drm/amdgpu: support indirect access reg outside of mmio bar (v2) · f7ee1874

由 Hawking Zhang 提交于 9月 18, 2020

support both direct and indirect accessor in unified
helper functions.

v2: Retire indirect mmio access via mm_index/data
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f7ee1874

drm/amdgpu: add helper function for indirect reg access (v3) · 1bba3683

由 Hawking Zhang 提交于 9月 17, 2020

Add helper function in order to remove RREG32/WREG32
in current pcie_rreg/wreg function for soc15 and
onwards adapters.
PCIE_INDEX/DATA pairs are used to access regsiters
outside of mmio bar in the helper functions.
The new helper functions help remove the recursion
of amdgpu_mm_rreg/wreg from pcie_rreg/wreg and
provide the oppotunity to centralize direct and
indirect access in a single function.

v2: Fixed typo and refine the comments

v3: Remove unnecessary volatile local variable
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1bba3683

30 9月, 2020 1 次提交

drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc. · 0c701415

由 Jiansong Chen 提交于 9月 23, 2020

Remove gpu_info fw support for sienna_cichlid etc., since the
information can be retrieved from discovery binary.
Signed-off-by: NJiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: NLikun Gao <Likun.Gao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0c701415

26 9月, 2020 4 次提交

drm/amdgpu: stop data_exchange work thread before reset · b602ca5f

由 Tiecheng Zhou 提交于 8月 19, 2020

In FLR routine, init_data_exchange is called at reset_sriov
while fini_data_exchange is not. This will duplicating work
thread.

So call fini_data_exchange before reset for SRIOV
Signed-off-by: NTiecheng Zhou <Tiecheng.Zhou@amd.com>
Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b602ca5f

drm/amdgpu: Implement new guest side VF2PF message transaction (v2) · 519b8b76

由 Bokun Zhang 提交于 7月 28, 2020

- Refactor the driver code to use amdgpu_virt_read_pf2vf_data
  and amdgpu_virt_write_vf2pf_data instead of writing all code in
  one function (which is the old amdgpu_virt_init_data_exchange)

- Adding a new transaction method for VF2PF message between host
  and guest driver. Guest side will periodically update VF2PF
  message in the framebuffer.

  In the new header, we include guest ucode information, guest
  framebuffer usage, and engine usage

- Clean up the old macros since they will cause compile error if
  the new transaction method is used

v2: squash in build fix
Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

519b8b76

drm/amdgpu: store noretry parameter per driver instance · 9b498efa

由 Alex Deucher 提交于 9月 23, 2020

This will allow us to have different defaults per asic
in a future patch.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9b498efa

drm/amdgpu: remove gpu_info fw support for sienna_cichlid etc. · 84d244a3

由 Jiansong Chen 提交于 9月 23, 2020

Remove gpu_info fw support for sienna_cichlid etc., since the
information can be retrieved from discovery binary.
Signed-off-by: NJiansong Chen <Jiansong.Chen@amd.com>
Reviewed-by: NLikun Gao <Likun.Gao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

84d244a3

18 9月, 2020 2 次提交

drm/amdgpu: unmap register bar on device init failure · 4192f7b5

由 Alex Deucher 提交于 9月 16, 2020

We never unmapped the regiser BAR on failure.
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4192f7b5

drm/amdgpu: No sysfs, not an error condition · 5aea5327

由 Luben Tuikov 提交于 9月 16, 2020

Not being able to create amdgpu sysfs attributes
is not a fatal error warranting not to continue
to try to bring up the display. Thus, if we get
an error trying to create amdgpu sysfs attrs,
report it and continue on to try to bring up
a display.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NSlava Abramov <slava.abramov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5aea5327

16 9月, 2020 8 次提交

drm/amdgpu: Minor checkpatch fix · 7cbbc745

由 Andrey Grodzovsky 提交于 8月 31, 2020

Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7cbbc745

drm/amdgpu: Disable DPC for XGMI for now. · 6894305c

由 Andrey Grodzovsky 提交于 8月 24, 2020

XGMI support is more complicated than single device support as
questions of synchronization between the device recovering from
PCI error and other members of the hive are required.
Leaving this for next round.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6894305c

drm/amdgpu: Trim amdgpu_pci_slot_reset by reusing code. · 7ac71382

由 Andrey Grodzovsky 提交于 8月 24, 2020

Reuse exsisting functions from GPU recovery to avoid code
duplications.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7ac71382

drm/amdgpu: Fix consecutive DPC recovery failures. · c1dd4aa6

由 Andrey Grodzovsky 提交于 8月 24, 2020

Cache the PCI state on boot and before each case where we might
loose it.

v2: Add pci_restore_state while caching the PCI state to avoid
breaking PCI core logic for stuff like suspend/resume.

v3: Extract pci_restore_state from amdgpu_device_cache_pci_state
to avoid superflous restores during GPU resets and suspend/resumes.

v4: Style fixes.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c1dd4aa6

drm/amdgpu: Fix SMU error failure · 362c7b91

由 Andrey Grodzovsky 提交于 8月 24, 2020

Wait for HW/PSP initiated ASIC reset to complete before
starting the recovery operations.

v2: Remove typo
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

362c7b91

drm/amdgpu: Block all job scheduling activity during DPC recovery · acd89fca

由 Andrey Grodzovsky 提交于 7月 29, 2020

DPC recovery involves ASIC reset just as normal GPU recovery so block
SW GPU schedulers and wait on all concurrent GPU resets.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

acd89fca

drm/amdgpu: Avoid accessing HW when suspending SW state · bf36b52e

由 Andrey Grodzovsky 提交于 7月 29, 2020

At this point the ASIC is already post reset by the HW/PSP
so the HW not in proper state to be configured for suspension,
some blocks might be even gated and so best is to avoid touching it.

v2: Rename in_dpc to more meaningful name
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bf36b52e

drm/amdgpu: Implement DPC recovery · c9a6b82f

由 Andrey Grodzovsky 提交于 7月 29, 2020

Add PCI Downstream Port Containment (DPC) with
basic recovery functionality

v2: remove pci_save_state to avoid breaking suspend/resume
v3: Fix style comments
v4: Improve description.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c9a6b82f

04 9月, 2020 2 次提交

drm/amdgpu: Remove superfluous NULL check · 1625951a

由 Luben Tuikov 提交于 9月 01, 2020

The DRM device is a static member of
the amdgpu device structure and as such
always exists, so long as the PCI and
thus the amdgpu device exist.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1625951a

drm/amdgpu: block ring buffer access during GPU recovery · 81202807

由 Dennis Li 提交于 9月 01, 2020

When GPU is in reset, its status isn't stable and ring buffer also need
be reset when resuming. Therefore driver should protect GPU recovery
thread from ring buffer accessed by other threads. Otherwise GPU will
randomly hang during recovery.

v2: correct indent
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

81202807

29 8月, 2020 1 次提交

drm/amdgpu: fix compiler warnings · e230ac11

由 Nirmoy Das 提交于 8月 27, 2020

Fixes below compiler warnings:
 CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_device.o
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
  381 | void static inline amdgpu_mm_wreg_mmio(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags)
      | ^~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_fini’:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3381:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]
 3381 |  int r;
      |      ^
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e230ac11

27 8月, 2020 4 次提交

drm/amdgpu: simplify hw status clear/set logic · 4cd2a96d

由 Jiawei 提交于 8月 27, 2020

Optimize code to iterate less loops in
amdgpu_device_ip_reinit_early_sriov()
Signed-off-by: NJiawei <Jiawei.Gu@amd.com>
Reviewed-by: NEmily.Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4cd2a96d

drm/amdgpu: report DC not supported if virtual display is enabled (v2) · c997e8e2

由 Alex Deucher 提交于 8月 24, 2020

Virtual display is non-atomic so report false to avoid checking
atomic state and other atomic things at runtime.

v2: squash into the sr-iov check
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Acked-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c997e8e2

drm/amdgpu: add a wrapper for atom asic_init · 4d2997ab

由 Alex Deucher 提交于 8月 24, 2020

This allows us to add asic specific workarounds for atom
asic init while keeping the adev specifics out of the
atombios parser code.
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4d2997ab

drm/amdgpu: Embed drm_device into amdgpu_device (v3) · 8aba21b7

由 Luben Tuikov 提交于 8月 14, 2020

a) Embed struct drm_device into struct amdgpu_device.
b) Modify the inline-f drm_to_adev() accordingly.
c) Modify the inline-f adev_to_drm() accordingly.
d) Eliminate the use of drm_device.dev_private,
   in amdgpu.
e) Switch from using drm_dev_alloc() to
   drm_dev_init().
f) Add a DRM driver release function, which frees
   the container amdgpu_device after all krefs on
   the contained drm_device have been released.

v2: Split out adding adev_to_drm() into its own
    patch (previous commit), making this patch
    more succinct and clear. More detailed commit
    description.
v3: squash in fix to call drmm_add_final_kfree()
    to avoid a warning.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8aba21b7

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功