提交 · afbaa15501125ae0b7de9dd16c6f00c85de14218 · openeuler / Kernel

19 10月, 2022 1 次提交

Revert "drm/amdgpu: add debugfs amdgpu_reset_level" · afbaa155

由 Victor Zhao 提交于 10月 13, 2022

This reverts commit 5bd8d53f.

This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.

Fixes: 5bd8d53f ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

afbaa155

21 9月, 2022 1 次提交

drm/amdgpu: add gang submit backend v2 · 68ce8b24

由 Christian König 提交于 3月 02, 2022

Allows submitting jobs as gang which needs to run on multiple
engines at the same time.

Basic idea is that we have a global gang submit fence representing when the
gang leader is finally pushed to run on the hardware last.

Jobs submitted as gang are never re-submitted in case of a GPU reset since this
won't work and will just deadlock the hardware immediately again.

v2: fix logic inversion, improve documentation, fix rcu
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

68ce8b24

17 8月, 2022 3 次提交

drm/amdgpu: reduce reset time · 194eb174

由 Victor Zhao 提交于 6月 24, 2022

In multi container use case, reset time is important, so skip ring
tests and cp halt wait during ip suspending for reset as they are
going to fail and cost more time on reset

v2: add a hang flag to indicate the reset comes from a job timeout,
skip ring test and cp halt wait in this case

v3: move hang flag to adev
Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

194eb174

drm/amdgpu: add debugfs amdgpu_reset_level · 5bd8d53f

由 Victor Zhao 提交于 6月 14, 2022

Introduce amdgpu_reset_level debugfs in order to help debug and
test specific type of reset. Also helps blocking unwanted type of
resets.

By default, mode2 reset will not be enabled

v2: make this debugfs in adev and use debugfs_create_u32
Signed-off-by: NVictor Zhao <Victor.Zhao@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5bd8d53f

drm/amdgpu: Increase tlb flush timeout for sriov · 373008bf

由 Dusica Milinkovic 提交于 8月 10, 2022

[Why]
During multi-vf executing benchmark (Luxmark) observed kiq error timeout.
It happenes because all of VFs do the tlb invalidation at the same time.
Although each VF has the invalidate register set, from hardware side
the invalidate requests are queue to execute.

[How]
In case of 12 VF increase timeout on 12*100ms
Signed-off-by: NDusica Milinkovic <Dusica.Milinkovic@amd.com>
Acked-by: NShaoyun Liu <shaoyun.liu@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

373008bf

29 7月, 2022 1 次提交

drm/amdgpu: Fix the incomplete product number · 1f83db6b

由 Roy Sun 提交于 7月 20, 2022

The comments say that the product number is a 16-digit HEX string so the
buffer needs to be at least 17 characters to hold the NUL terminator. Expand
the buffer size to 20 to avoid the alignment issues.

The comment:Product number should only be 16 characters. Any
more,and something could be wrong. Cap it at 16 to be safe
Signed-off-by: NRoy Sun <Roy.Sun@amd.com>
Reviewed-by: NAndré Almeida <andrealmeid@igalia.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1f83db6b

25 7月, 2022 1 次提交

drm/amd/display: Add visualconfirm module parameter · 792a0cdd

由 Leo Li 提交于 7月 06, 2022

[Why]

Being able to configure visual confirm at boot or in cmdline is helpful
when debugging.

[How]

Add a module parameter to configure DC visual confirm, which works the
same way as the equivalent debugfs entry.
Signed-off-by: NLeo Li <sunpeng.li@amd.com>
Reviewed-by: NRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

792a0cdd

19 7月, 2022 1 次提交

drm/amdgpu: drop runpm from amdgpu_device structure · 9c913f38

由 Guchun Chen 提交于 7月 14, 2022

It's redundant, as now switching to rpm_mode to indicate
runtime power management mode.
Suggested-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9c913f38

13 7月, 2022 1 次提交

drm/amdgpu: support reset flag set for gpu reset · f1549c09

由 Likun Gao 提交于 7月 08, 2022

Move reset_context out of gpu recover function to make it configurable
for different reset purpose.
For the reset way of call gpu_recovery sysfs, force to use full reset
method. Otherwise, try soft reset by default if the related ASIC
supportted, if soft reset failed, will use full reset.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1549c09

11 6月, 2022 2 次提交

drm/amdgpu: Rename amdgpu_device_gpu_recover_imp back to amdgpu_device_gpu_recover · cf727044

由 Andrey Grodzovsky 提交于 5月 17, 2022

We removed the wrapper that was queueing the recover function
into reset domain queue who was using this name.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cf727044

drm/amdgpu: Add work_struct for GPU reset from debugfs · 2f83658f

由 Andrey Grodzovsky 提交于 5月 17, 2022

We need to have a work_struct to cancel this reset if another
already in progress.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2f83658f

08 6月, 2022 2 次提交

drm/amdgpu: enable ASPM support for PCIE 7.4.0/7.6.0 · 62f8f5c3

由 Evan Quan 提交于 4月 28, 2022

Enable ASPM support for PCIE 7.4.0 and 7.6.0.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

62f8f5c3

drm/amdgpu: Add peer-to-peer support among PCIe connected AMD GPUs · 08a2fd23

由 Ramesh Errabolu 提交于 5月 26, 2022

Add support for peer-to-peer communication among AMD GPUs over PCIe
bus. Support REQUIRES enablement of config HSA_AMD_P2P.
Signed-off-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

08a2fd23

07 6月, 2022 2 次提交

drm/amdgpu: adding device coredump support · 3d8785f6

由 Somalapuram Amaranath 提交于 6月 02, 2022

Added device coredump information:
- Kernel version
- Module
- Time
- VRAM status
- Guilty process name and PID
- GPU register dumps
v1 -> v2: Variable name change
v1 -> v2: NULL check
v1 -> v2: Code alignment
v1 -> v2: Adding dummy amdgpu_devcoredump_free
v1 -> v2: memset reset_task_info to zero
v2 -> v3: add CONFIG_DEV_COREDUMP for variables
v2 -> v3: remove NULL check on amdgpu_devcoredump_read
Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3d8785f6

drm/amdgpu: save the reset dump register value for devcoredump · 651d7ee6

由 Somalapuram Amaranath 提交于 6月 02, 2022

Allocate memory for register value and use the same values for devcoredump.
v1 -> v2: Change krealloc_array() to kmalloc_array()
v2 -> v3: Fix alignment
Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

651d7ee6

04 6月, 2022 1 次提交

drm/amd: Fix spelling typo in comments · faf26f2b

由 pengfuyuan 提交于 5月 26, 2022

Fix spelling typo in comments.
Reported-by: Nk2ci <kernel-bot@kylinos.cn>
Signed-off-by: Npengfuyuan <pengfuyuan@kylinos.cn>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

faf26f2b

19 5月, 2022 2 次提交

drm/amd: Don't reset dGPUs if the system is going to s2idle · 7123d39d

由 Mario Limonciello 提交于 5月 17, 2022

An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.

This functionality was introduced in commit daf8de08 ("drm/amdgpu:
always reset the asic in suspend (v2)"). A few other commits have
gone on top of the ASIC reset, but this still doesn't work on the A+A
configuration in s2idle.

Avoid doing the reset on dGPUs specifically when using s2idle.

Fixes: daf8de08 ("drm/amdgpu: always reset the asic in suspend (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2008Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

7123d39d

drm/amd: Don't reset dGPUs if the system is going to s2idle · 0223e516

由 Mario Limonciello 提交于 5月 17, 2022

An A+A configuration on ASUS ROG Strix G513QY proves that the ASIC
reset for handling aborted suspend can't work with s2idle.

Avoid doing the reset on dGPUs specifically when using s2idle.

0223e516

11 5月, 2022 2 次提交

drm/amdgpu: add lsdma block · 1b491330

由 Likun Gao 提交于 5月 05, 2022

Add Light SDMA (LSDMA) block and related function. LSDMA
is a small instance of SDMA mainly for kernel driver use.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1b491330

drm/amdgpu/psp: Add vbflash sysfs interface support · 8424f2cc

由 Likun Gao 提交于 2月 22, 2022

Add sysfs interface to copy VBIOS.

v2: squash in fix for proper vmalloc API (Alex)
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8424f2cc

06 5月, 2022 1 次提交

Revert "drm/amdgpu: disable runpm if we are the primary adapter" · 5a90c24a

由 Alex Deucher 提交于 5月 04, 2022

This reverts commit b95dc06a.

This workaround is no longer necessary.  We have a better workaround
in commit f95af4a9 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)").
Reviewed-by: NJavier Martinez Canillas <javierm@redhat.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5a90c24a

04 5月, 2022 5 次提交

drm/amdgpu: add mes_kiq module parameter v2 · 928fe236

由 Jack Xiao 提交于 4月 14, 2021

mes_kiq parameter is used to enable mes kiq pipe.
This module parameter is unneccessary or enabled by default
in final version.

v2: reword commit message.
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

928fe236

drm/amdgpu: add the per-context meta data v3 · 2bc956ef

由 Jack Xiao 提交于 3月 27, 2020

The per-context meta data is a per-context data structure associated
with a mes-managed hardware ring, which includes MCBP CSA, ring buffer
and etc.

v2: fix typo
v3: a. use structure instead of typedef
    b. move amdgpu_mes_ctx_get_offs_* to amdgpu_ring.h
    c. use __aligned to make alignement
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2bc956ef

drm/amdgpu: define MQD abstract layer for hw ip · 5405a526

由 Jack Xiao 提交于 7月 01, 2020

Define MQD abstract layer for hw ip, for the passing
mqd configuration not only from ring but more sources,
like user queue.
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5405a526

drm/amdgpu: add tracking for the enablement of SCPM · 7f318f4e

由 Likun Gao 提交于 5月 04, 2022

Add parmeter to shows whether SCPM feature is enabled or not, and
whether is valid.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7f318f4e

drm/amdgpu: add hdp version 6 functions · 563fcfbf

由 Likun Gao 提交于 4月 04, 2022

Unify hdp related function into hdp structure for hdp version 6.
V2: Remove hdp invalidate function as hdp v6 doesn't have read cache.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

563fcfbf

29 4月, 2022 2 次提交

drm/amdgpu: add function to decode ip version · 1d5eee7d

由 Likun Gao 提交于 12月 10, 2021

Add function to decode IP version.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1d5eee7d

drm/amdgpu: increase HWIP MAX INSTANCE · 3202c7e7

由 Likun Gao 提交于 11月 07, 2019

Extend HWIP MAX INSTANCE to 11.
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3202c7e7

09 4月, 2022 1 次提交

drm/amdgpu: expand cg_flags from u32 to u64 · 25faeddc

由 Evan Quan 提交于 3月 25, 2022

With this, we can support more CG flags.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

25faeddc

05 3月, 2022 2 次提交

drm/amdgpu: header cleanup · a190f8dc

由 Christian König 提交于 2月 21, 2022

No function change, just move a bunch of definitions from amdgpu.h into
separate header files.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a190f8dc

drm/amdgpu/vcn: Add vcn firmware log · 11eb648d

由 Ruijing Dong 提交于 3月 02, 2022

vcn fwlog is for debugging purpose only,
by default, it is disabled.
Signed-off-by: NRuijing Dong <ruijing.dong@amd.com>
Reviewed-by: NLeo Liu <leo.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

11eb648d

25 2月, 2022 1 次提交

drm/amdgpu: Add use_xgmi_p2p module parameter · 158a05a0

由 Alex Sierra 提交于 2月 23, 2022

This parameter controls xGMI p2p communication, which is enabled by
default. However, it can be disabled by setting it to 0. In case xGMI
p2p is disabled in a dGPU, PCIe p2p interface will be used instead.
This parameter is ignored in GPUs that do not support xGMI
p2p configuration.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Acked-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

158a05a0

24 2月, 2022 5 次提交

drm/amdgpu: add debugfs for reset registers list · 5ce5a584

由 Somalapuram Amaranath 提交于 2月 23, 2022

List of register populated for dump collection during the GPU reset.
Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5ce5a584

drm/amdgpu: drop testing module parameter · b784f42c

由 Alex Deucher 提交于 2月 18, 2022

This test is not particularly useful now that GTT and GART
are decoupled in the driver.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b784f42c

drm/amdgpu: drop benchmark module parameter · 0b1a6348

由 Alex Deucher 提交于 2月 18, 2022

Now that we expose the benchmarks via debugfs, there is no
longer a need for the module parameter.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0b1a6348

drm/amdgpu: add a benchmark mutex · f113cc32

由 Alex Deucher 提交于 2月 18, 2022

To avoid multiple runs in parallel to avoid mixing results.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f113cc32

drm/amdgpu: plumb error handling though amdgpu_benchmark() · e460f244

由 Alex Deucher 提交于 2月 18, 2022

So we can tell when this function fails.

v2: squash in error handling fix (Alex)
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e460f244

18 2月, 2022 1 次提交

drm/amd: Refactor `amdgpu_aspm` to be evaluated per device · 0ab5d711

由 Mario Limonciello 提交于 2月 16, 2022

Evaluating `pcie_aspm_enabled` as part of driver probe has the implication
that if one PCIe bridge with an AMD GPU connected doesn't support ASPM
then none of them do.  This is an invalid assumption as the PCIe core will
configure ASPM for individual PCIe bridges.

Create a new helper function that can be called by individual dGPUs to
react to the `amdgpu_aspm` module parameter without having negative results
for other dGPUs on the PCIe bus.
Suggested-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0ab5d711

15 2月, 2022 1 次提交

drm/amdgpu: Show IP discovery in sysfs · a6c40b17

由 Luben Tuikov 提交于 2月 03, 2022

Add IP discovery data in sysfs. The format is:
/sys/class/drm/cardX/device/ip_discovery/die/D/B/I/<attrs>
where,
X is the card ID, an integer,
D is the die ID, an integer,
B is the IP HW ID, an integer, aka block type,
I is the IP HW ID instance, an integer.
<attrs> are the attributes of the block instance. At the moment these
include HW ID, instance number, major, minor, revision, number of base
addresses, and the base addresses themselves.

A symbolic link of the acronym HW ID is also created, under D/, if you
prefer to browse by something humanly accessible.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Tom StDenis <tom.stdenis@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a6c40b17

10 2月, 2022 1 次提交

drm/amdgpu: Move in_gpu_reset into reset_domain · 89a7a870

由 Andrey Grodzovsky 提交于 1月 19, 2022

We should have a single instance per entrire reset domain.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74116.html

89a7a870

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功