提交 · caa068c9bb2bc86e6da2caf8508f3fda24d4dea0 · openeuler / Kernel

16 2月, 2023 1 次提交

drm/amd/amdgpu: fix warning during suspend · 8f323789

由 Jack Xiao 提交于 2月 10, 2023

Freeing memory was warned during suspend.
Move the self test out of suspend.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825
Cc: jfalempe@redhat.com
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-and-tested-by: NEvan Quan <evan.quan@amd.com>
Tested-by: NJocelyn Falempe <jfalempe@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

8f323789

09 2月, 2023 5 次提交

drm/amdgpu: add S/G display parameter · 4693e852

由 Alex Deucher 提交于 2月 09, 2023

Some users have reported flickerng with S/G display.  We've
tried extensively to reproduce and debug the issue on a wide
variety of platform configurations (DRAM bandwidth, etc.) and
a variety of monitors, but so far have not been able to.  We
disabled S/G display on a number of platforms to address this
but that leads to failure to pin framebuffers errors and
blank displays when there is memory pressure or no displays
at all on systems with limited carveout (e.g., Chromebooks).
Add a option to disable this as a debugging option as a
way for users to disable this, depending on their use case,
and for us to help debug this further.

v2: fix typo
Reviewed-by: NHarry Wentland <harry.wentland@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4693e852

amd/amdgpu: remove test ib on hw ring · 6c1a6d0b

由 JesseZhang 提交于 2月 08, 2023

test ib function is not necessary on hw ring,
so remove it.

v2: squash in NULL check fix
Signed-off-by: NJesseZhang <Jesse.Zhang@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6c1a6d0b

drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini · 5ad7bbf3

由 Guilherme G. Piccoli 提交于 2月 02, 2023

Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of 0000:04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0000000000000090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 <TASK>
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using sched.ops as per
Christian's suggestion [0], and also removed the no_scheduler check [1].

[0] https://lore.kernel.org/amd-gfx/984ee981-2906-0eaf-ccec-9f80975cb136@amd.com/
[1] https://lore.kernel.org/amd-gfx/cd0e2994-f85f-d837-609f-7056d5fb7231@amd.com/

Fixes: 067f44c8 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 test (v2)")
Suggested-by: NChristian König <christian.koenig@amd.com>
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

5ad7bbf3

drm/amdgpu: Use the TGID for trace_amdgpu_vm_update_ptes · e53448e0

由 Friedrich Vock 提交于 2月 02, 2023

The pid field corresponds to the result of gettid() in userspace.
However, userspace cannot reliably attribute PTE events to processes
with just the thread id. This patch allows userspace to easily
attribute PTE update events to specific processes by comparing this
field with the result of getpid().

For attributing events to specific threads, the thread id is also
contained in the common fields of each trace event.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NFriedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

e53448e0

drm/amd/amdgpu: enable athub cg 11.0.3 · 5630a350

由 Kenneth Feng 提交于 2月 03, 2023

enable athub cg on gc 11.0.3
Signed-off-by: NKenneth Feng <kenneth.feng@amd.com>
Reviewed-by: NLikun Gao <Likun.Gao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5630a350

04 2月, 2023 1 次提交

drm/amdgpu: fix memory leak in amdgpu_cs_sync_rings · 9f8b3706

由 Bert Karwatzki 提交于 2月 02, 2023

amdgpu_sync_get_fence deletes the returned fence from the
syncobj, so the refcount of fence needs to lowered to avoid
a memory leak.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2360Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NBert Karwatzki <spasswolf@web.de>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/3b590ba0f11d24b8c6c39c3d38250129c1116af4.camel@web.de

9f8b3706

02 2月, 2023 3 次提交

drm/amd: Fix initialization for nbio 4.3.0 · 5048fa1e

由 Mario Limonciello 提交于 1月 30, 2023

A mistake has been made on some boards with NBIO 4.3.0 where some
NBIO registers aren't properly set by the hardware.

Ensure that they're set during initialization.

Cc: Natikar Basavaraj <Basavaraj.Natikar@amd.com>
Tested-by: NSatyanarayana ReddyTVN <Satyanarayana.ReddyTVN@amd.com>
Tested-by: NRutvij Gajjar <Rutvij.Gajjar@amd.com>
Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

5048fa1e

drm/amdgpu: enable HDP SD for gfx 11.0.3 · bb25849c

由 Evan Quan 提交于 1月 28, 2023

Enable HDP clock gating control for gfx 11.0.3.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bb25849c

drm/amdgpu: update wave data type to 3 for gfx11 · ed8e793c

由 Graham Sider 提交于 1月 16, 2023

SQ_WAVE_INST_DW0 isn't present on gfx11 compared to gfx10, so update
wave data type to signify a difference.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NMukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

ed8e793c

26 1月, 2023 3 次提交

drm/amdgpu: declare firmware for new MES 11.0.4 · f0f77436

由 Li Ma 提交于 1月 20, 2023

To support new mes ip block
Signed-off-by: NLi Ma <li.ma@amd.com>
Reviewed-by: NYifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f0f77436

drm/amdgpu: enable imu firmware for GC 11.0.4 · 08fbe3c2

由 Li Ma 提交于 1月 20, 2023

The GC 11.0.4 needs load IMU to power up GFX before loads GFX firmware.
Signed-off-by: NLi Ma <li.ma@amd.com>
Reviewed-by: NYifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

08fbe3c2

drm/amdgpu: remove unconditional trap enable on add gfx11 queues · 2de37698

由 Jonathan Kim 提交于 1月 19, 2023

Rebase of driver has incorrect unconditional trap enablement
for GFX11 when adding mes queues.
Reported-by: NGraham Sider <graham.sider@amd.com>
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NGraham Sider <graham.sider@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

2de37698

19 1月, 2023 4 次提交

drm/amdgpu: allow multipipe policy on ASICs with one MEC · dc88063b

由 Lang Yu 提交于 1月 11, 2023

Always enable multipipe policy on ASICs with GC VERSION > 9.0.0
instead of MEC number > 1.

This will allow multipipe policy on ASICs with one MEC,
e.g., gfx11 APUs.
Signed-off-by: NLang Yu <Lang.Yu@amd.com>
Reviewed-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NYifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

dc88063b

drm/amdgpu: correct MEC number for gfx11 APUs · 0ddadc3a

由 Lang Yu 提交于 1月 11, 2023

There is only one MEC on these APUs.
Signed-off-by: NLang Yu <Lang.Yu@amd.com>
Reviewed-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NYifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

0ddadc3a

drm/amdgpu: fix amdgpu_job_free_resources v2 · 74ea8e78

由 Christian König 提交于 1月 12, 2023

It can be that neither fence were initialized when we run out of UVD
streams for example.

v2: fix typo breaking compile

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2324Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x

74ea8e78

drm/amdgpu: fix cleaning up reserved VMID on release · 4463b1ee

由 Christian König 提交于 1月 13, 2023

We need to reset this or otherwise run into list corruption later on.

Fixes: e44a0fe6 ("drm/amdgpu: rework reserved VMID handling")
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Tested-by: NCandice Li <candice.li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4463b1ee

11 1月, 2023 1 次提交

drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU · a6941f89

由 Eric Huang 提交于 1月 05, 2023

The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a6941f89

10 1月, 2023 4 次提交

drm/amdgpu: fix pipeline sync v2 · 3bd68b32

由 Christian König 提交于 1月 09, 2023

This fixes a potential memory leak of dma_fence objects in the CS code
as well as glitches in firefox because of missing pipeline sync.

v2: use the scheduler instead of the fence context
Signed-off-by: NChristian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2323
Tested-by: Michal Kubecek mkubecek@suse.cz
Tested-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230109130120.73389-1-christian.koenig@amd.com

3bd68b32

drm/amdgpu: Fixed bug on error when unloading amdgpu · 99f1a36c

由 YiPeng Chai 提交于 1月 06, 2023

Fixed bug on error when unloading amdgpu.

The error message is as follows:
[  377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
[  377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G          IOE      6.0.0-thomas #1
[  377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[  377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy]
[  377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53
[  377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287
[  377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000
[  377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70
[  377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001
[  377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70
[  377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70
[  377.706325] FS:  00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000
[  377.706334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0
[  377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  377.706361] Call Trace:
[  377.706365]  <TASK>
[  377.706369]  drm_buddy_free_list+0x2a/0x60 [drm_buddy]
[  377.706376]  amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu]
[  377.706572]  amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu]
[  377.706650]  amdgpu_bo_fini+0x22/0x90 [amdgpu]
[  377.706727]  gmc_v11_0_sw_fini+0x26/0x30 [amdgpu]
[  377.706821]  amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu]
[  377.706897]  amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
[  377.706975]  drm_dev_release+0x20/0x40 [drm]
[  377.707006]  release_nodes+0x35/0xb0
[  377.707014]  devres_release_all+0x8b/0xc0
[  377.707020]  device_unbind_cleanup+0xe/0x70
[  377.707027]  device_release_driver_internal+0xee/0x160
[  377.707033]  driver_detach+0x44/0x90
[  377.707039]  bus_remove_driver+0x55/0xe0
[  377.707045]  pci_unregister_driver+0x3b/0x90
[  377.707052]  amdgpu_exit+0x11/0x6c [amdgpu]
[  377.707194]  __x64_sys_delete_module+0x142/0x2b0
[  377.707201]  ? fpregs_assert_state_consistent+0x22/0x50
[  377.707208]  ? exit_to_user_mode_prepare+0x3e/0x190
[  377.707215]  do_syscall_64+0x38/0x90
[  377.707221]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Signed-off-by: NYiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

99f1a36c

drm/amd: Delay removal of the firmware framebuffer · 1923bc5a

由 Mario Limonciello 提交于 12月 27, 2022

Removing the firmware framebuffer from the driver means that even
if the driver doesn't support the IP blocks in a GPU it will no
longer be functional after the driver fails to initialize.

This change will ensure that unsupported IP blocks at least cause
the driver to work with the EFI framebuffer.

Cc: stable@vger.kernel.org
Suggested-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1923bc5a

drm/amdgpu: Fix potential NULL dereference · 0be7ed8e

由 Luben Tuikov 提交于 1月 04, 2023

Fix potential NULL dereference, in the case when "man", the resource manager
might be NULL, when/if we print debug information.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Fixes: 7554886d ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)")
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0be7ed8e

06 1月, 2023 2 次提交

drm/amdgpu: fix missing dma_fence_put in error path · 41cc108b

由 Christian König 提交于 1月 05, 2023

When the fence can't be added we need to drop the reference.
Suggested-by: NBert Karwatzki <spasswolf@web.de>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-2-christian.koenig@amd.comReviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>

41cc108b

drm/amdgpu: fix another missing fence reference in the CS code · ed21f6c3

由 Christian König 提交于 1月 05, 2023

drm_sched_job_add_dependency() consumes the references of the gang
members. Only triggered by mesh shaders.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Fixes: 1728baa7 ("drm/amdgpu: use scheduler dependencies for CS")
Tested-by: NMike Lothian <mike@fireburn.co.uk>
Tested-by: NBert Karwatzki <spasswolf@web.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-1-christian.koenig@amd.comReviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>

ed21f6c3

05 1月, 2023 1 次提交

Revert "drm/amd/display: Enable Freesync Video Mode by default" · 6fe6ece3

由 Michel Dänzer 提交于 12月 21, 2022

This reverts commit de05abe6.

The bug referenced below was bisected to this commit. There has been no
activity toward fixing it in 3 months, so let's revert for now.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162Signed-off-by: NMichel Dänzer <mdaenzer@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

6fe6ece3

22 12月, 2022 2 次提交

drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency · c1c4a8b2

由 Christian König 提交于 12月 19, 2022

That function consumes the reference.
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Reported-by: NBorislav Petkov (AMD) <bp@alien8.de>
Tested-by: NBorislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Fixes: aab9cf7b ("drm/amdgpu: use scheduler dependencies for VM updates")
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c1c4a8b2

drm/amdgpu: enable VCN DPG for GC IP v11.0.4 · e1d900df

由 Saleemkhan Jamadar 提交于 12月 20, 2022

Enable VCN Dynamic Power Gating control for GC IP v11.0.4.
Signed-off-by: NSaleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: NVeerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1

e1d900df

21 12月, 2022 3 次提交

drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0 · 8660495a

由 Tim Huang 提交于 12月 19, 2022

MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
it's should also be skipped for s0ix as no hardware re-initialization
happened.

Besides, mes_self_test will free the BO that triggers a lot of warning
messages while in the suspend state.

[   81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[   81.679435] Call Trace:
[   81.679726]  <TASK>
[   81.679981]  amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[   81.680857]  amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[   81.681665]  mes_v11_0_late_init+0x37/0x50 [amdgpu]
[   81.682423]  amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[   81.683257]  amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[   81.684043]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[   81.684818]  pci_pm_resume+0x5c/0xa0
[   81.685247]  ? pci_pm_thaw+0x90/0x90
[   81.685658]  dpm_run_callback+0x4e/0x160
[   81.686110]  device_resume+0xad/0x210
[   81.686529]  async_resume+0x1e/0x40
[   81.686931]  async_run_entry_fn+0x33/0x120
[   81.687405]  process_one_work+0x21d/0x3f0
[   81.687869]  worker_thread+0x4a/0x3c0
[   81.688293]  ? process_one_work+0x3f0/0x3f0
[   81.688777]  kthread+0xff/0x130
[   81.689157]  ? kthread_complete_and_exit+0x20/0x20
[   81.689707]  ret_from_fork+0x22/0x30
[   81.690118]  </TASK>
[   81.690380] ---[ end trace 0000000000000000 ]---

v2: make the comment clean and use adev->in_s0ix instead of
adev->suspend
Signed-off-by: NTim Huang <tim.huang@amd.com>
Reviewed-by: NMario Limonciello <mario.limonciello@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1

8660495a

drm/amdgpu: skip MES for S0ix as well since it's part of GFX · afa6646b

由 Alex Deucher 提交于 12月 16, 2022

It's also part of gfxoff.

Cc: stable@vger.kernel.org # 6.0, 6.1
Reviewed-by: NMario Limonciello <mario.limonciello@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

afa6646b

drm/amdkfd: Fix double release compute pasid · 1a799c4c

由 Philip Yang 提交于 12月 13, 2022

If kfd_process_device_init_vm returns failure after vm is converted to
compute vm and vm->pasid set to compute pasid, KFD will not take
pdd->drm_file reference. As a result, drm close file handler maybe
called to release the compute pasid before KFD process destroy worker to
release the same pasid and set vm->pasid to zero, this generates below
WARNING backtrace and NULL pointer access.

Add helper amdgpu_amdkfd_gpuvm_set_vm_pasid and call it at the last step
of kfd_process_device_init_vm, to ensure vm pasid is the original pasid
if acquiring vm failed or is the compute pasid with pdd->drm_file
reference taken to avoid double release same pasid.

 amdgpu: Failed to create process VM object
 ida_free called for id=32770 which is not allocated.
 WARNING: CPU: 57 PID: 72542 at ../lib/idr.c:522 ida_free+0x96/0x140
 RIP: 0010:ida_free+0x96/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 RIP: 0010:ida_free+0x76/0x140
 Call Trace:
  amdgpu_pasid_free_delayed+0xe1/0x2a0 [amdgpu]
  amdgpu_driver_postclose_kms+0x2d8/0x340 [amdgpu]
  drm_file_free.part.13+0x216/0x270 [drm]
  drm_close_helper.isra.14+0x60/0x70 [drm]
  drm_release+0x6e/0xf0 [drm]
  __fput+0xcc/0x280
  ____fput+0xe/0x20
  task_work_run+0x96/0xc0
  do_exit+0x3d0/0xc10
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1a799c4c

15 12月, 2022 4 次提交

drm/amdgpu: revert "generally allow over-commit during BO allocation" · 47722220

由 Christian König 提交于 12月 12, 2022

This reverts commit f9d00a4a.

This causes problem for KFD because when we overcommit we accidentially
bind the BO to GTT for moving it into VRAM. We also need to make sure
that this is done only as fallback after trying to evict first.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

47722220

drm/amdgpu: Remove unnecessary domain argument · 3273f116

由 Luben Tuikov 提交于 12月 14, 2022

Remove the "domain" argument to amdgpu_bo_create_kernel_at() since this
function takes an "offset" argument which is the offset off of VRAM, and as
such allocation always takes place in VRAM. Thus, the "domain" argument is
unnecessary.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3273f116

drm/amdgpu: Fix size validation for non-exclusive domains (v4) · 7554886d

由 Luben Tuikov 提交于 12月 10, 2022

Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
requested memory exists, else we get a kernel oops when dereferencing "man".

v2: Make the patch standalone, i.e. not dependent on local patches.
v3: Preserve old behaviour and just check that the manager pointer is not
    NULL.
v4: Complain if GTT domain requested and it is uninitialized--most likely a
    bug.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7554886d

drm/amdgpu: Check if fru_addr is not NULL (v2) · 28afcb0a

由 Luben Tuikov 提交于 12月 12, 2022

Always check if fru_addr is not NULL. This commit also fixes a "smatch"
warning.

v2: Add a Fixes tag.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Fixes: afbe5d1e ("drm/amdgpu: Bug-fix: Reading I2C FRU data on newer ASICs")
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NKent Russell <kent.russell@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

28afcb0a

14 12月, 2022 6 次提交

drm/amdgpu: rework reserved VMID handling · e44a0fe6

由 Christian König 提交于 11月 25, 2022

Instead of reserving a VMID for a single process allow that many
processes use the reserved ID. This allows for proper isolation
between the processes.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e44a0fe6

drm/amdgpu: stop waiting for the VM during unreserve · 053499f7

由 Christian König 提交于 11月 25, 2022

This is completely pointless since the VMID always stays allocated until
the VM is idle.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

053499f7

drm/amdgpu: cleanup SPM support a bit · 5f3c40e9

由 Christian König 提交于 11月 25, 2022

This should probably not access job->vm and also emit the SPM switch
under the conditional execute.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5f3c40e9

drm/amdgpu: fix GDS/GWS/OA switch handling · 56b0989e

由 Christian König 提交于 11月 25, 2022

Bas pointed out that this isn't working as expected and could cause
crashes. Fix the handling by storing the marker that a switch is needed
inside the job instead.
Reported-by: NBas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

56b0989e

drm/amdgpu: Add notifier lock for KFD userptrs · f95f51a4

由 Felix Kuehling 提交于 4月 21, 2021

Add a per-process MMU notifier lock for processing notifiers from
userptrs. Use that lock to properly synchronize page table updates with
MMU notifiers.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f95f51a4

drm/amdgpu: WARN when freeing kernel memory during suspend · 4d2ccd96

由 Christian König 提交于 11月 16, 2022

When buffers are freed during suspend there is no guarantee that
they can be re-allocated during resume.

The PSP subsystem seems to be quite buggy regarding this, so add
a WARN_ON() to point out those bugs.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexdeucher@amd.com>
Tested-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4d2ccd96

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功