提交 · 3d8785f6c04a953868384db455bb2fdd0b22c11c · openeuler / Kernel

07 6月, 2022 2 次提交

drm/amdgpu: adding device coredump support · 3d8785f6

由 Somalapuram Amaranath 提交于 6月 02, 2022

Added device coredump information:
- Kernel version
- Module
- Time
- VRAM status
- Guilty process name and PID
- GPU register dumps
v1 -> v2: Variable name change
v1 -> v2: NULL check
v1 -> v2: Code alignment
v1 -> v2: Adding dummy amdgpu_devcoredump_free
v1 -> v2: memset reset_task_info to zero
v2 -> v3: add CONFIG_DEV_COREDUMP for variables
v2 -> v3: remove NULL check on amdgpu_devcoredump_read
Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3d8785f6

drm/amdgpu: save the reset dump register value for devcoredump · 651d7ee6

由 Somalapuram Amaranath 提交于 6月 02, 2022

Allocate memory for register value and use the same values for devcoredump.
v1 -> v2: Change krealloc_array() to kmalloc_array()
v2 -> v3: Fix alignment
Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: NShashank Sharma <Shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

651d7ee6

04 6月, 2022 4 次提交

drm/amdgpu: fix up comment in amdgpu_device_asic_has_dc_support() · b5a0168e

由 Alex Deucher 提交于 5月 24, 2022

LVDS support was implemented in DC a while ago.  Just DAC
support is left to do.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b5a0168e

drm/amdgpu: simplify the logic in amdgpu_device_parse_gpu_info_fw() · 1d6c3633

由 Alex Deucher 提交于 5月 24, 2022

Drop all of the extra cases in the default case.
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1d6c3633

amdgpu: amdgpu_device.c: Removed trailing whitespace · f74e78ca

由 Mitchell Augustin 提交于 5月 25, 2022

Removed trailing whitespace from end of line in amdgpu_device.c
Signed-off-by: NMitchell Augustin <kernel@mitchellaugustin.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f74e78ca

drm/amdgpu: simplify amdgpu_device_asic_has_dc_support() · b8b64595

由 Alex Deucher 提交于 5月 24, 2022

Drop extra cases in the default case.
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b8b64595

27 5月, 2022 2 次提交

drm/amdgpu: move amdgpu_gmc_tmz_set after ip_version populated · 4d33e704

由 Sunil Khatri 提交于 5月 17, 2022

To enable TMZ feature based on IP version needs adev->ip_version
populated but its empty. Move amdgpu_gmc_tmz_set to a place where
ip_version is populated.
Signed-off-by: NSunil Khatri <sunil.khatri@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4d33e704

drm/amdgpu: support ras on SRIOV · 950d6425

由 Stanley.Yang 提交于 4月 27, 2022

support umc/gfx/sdma ras on guest side

Changed from V1:
    move sriov judgment in amdgpu_ras_interrupt_fatal_error_handler
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

950d6425

11 5月, 2022 1 次提交

drm/amdgpu/psp: Add vbflash sysfs interface support · 8424f2cc

由 Likun Gao 提交于 2月 22, 2022

Add sysfs interface to copy VBIOS.

v2: squash in fix for proper vmalloc API (Alex)
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8424f2cc

07 5月, 2022 1 次提交

drm/amdgpu: flush delete wq after wait fence · 98f56188

由 Yiqing Yao 提交于 5月 05, 2022

[why]
lru_list not empty warning in sw fini during repeated device bind unbind.
There should be a amdgpu_fence_wait_empty() before the flush_delayed_work()
call as Christian suggested.

[how]
Move to do flush_delayed_work for ttm bo delayed delete wq after fence_driver_hw_fini.

Tested by: Yiqing Yao <yiqing.yao@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NYiqing Yao <yiqing.yao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

98f56188

04 5月, 2022 4 次提交

drm/amdgpu: Enable KFD with MES enabled · c004d44e

由 Mukul Joshi 提交于 3月 31, 2022

Enable KFD initialization with MES enabled.
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Acked-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c004d44e

drm/amdgpu: skip kfd routines when mes enabled · 9c12f5cd

由 Jack Xiao 提交于 3月 31, 2022

For kfd hasn't supported mes, skip kfd routines.
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9c12f5cd

drm/amdgpu: add mes_kiq module parameter v2 · 928fe236

由 Jack Xiao 提交于 4月 14, 2021

mes_kiq parameter is used to enable mes kiq pipe.
This module parameter is unneccessary or enabled by default
in final version.

v2: reword commit message.
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

928fe236

drm/amdgpu: use the whole doorbell space for mes · de33a329

由 Jack Xiao 提交于 3月 20, 2020

Use the whole doorbell space for mes. Each queue in one process occupies
one doorbell slot to ring the queue submitting.
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

de33a329

29 4月, 2022 2 次提交

drm/amdgpu: switch to atomfirmware_asic_init · 85d1bcc6

由 Hawking Zhang 提交于 2月 28, 2022

Some initial settings now are not available from
the atom data table. The assumption that !ps[0]
|| !ps[1] in amdgpu_atom_asic_init is not valid.
In addition, driver needs to strictly follow
atomfirmware structure (asic_init_parameters) to
initialize parameters used to execute asic_init
function, otherwise, the execution of asic_init
would fail.

This shall be applicable to all soc15 adapters,but
let make the transition on soc21 first.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

85d1bcc6

drm/amdgpu/discovery: move all table parsing into amdgpu_discovery.c · e24d0e91

由 Alex Deucher 提交于 3月 30, 2022

This data has no dependencies, so encapsulate it all within
amdgpu_discovery.c.
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e24d0e91

22 4月, 2022 1 次提交

drm/amd/amdgpu: Update PF2VF header · e15c9d06

由 Bokun Zhang 提交于 4月 21, 2022

- In the latest version of the header, there is a variable name change.
  This should not cause any backward compatibility since the variable is
  at the same offset in the struct.
Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e15c9d06

09 4月, 2022 1 次提交

drm/amdgpu: expand cg_flags from u32 to u64 · 25faeddc

由 Evan Quan 提交于 3月 25, 2022

With this, we can support more CG flags.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

25faeddc

26 3月, 2022 1 次提交

drm/amdgpu/gmc: use PCI BARs for APUs in passthrough · b818a5d3

由 Alex Deucher 提交于 3月 09, 2022

If the GPU is passed through to a guest VM, use the PCI
BAR for CPU FB access rather than the physical address of
carve out.  The physical address is not valid in a guest.

v2: Fix HDP handing as suggested by Michel
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NMichel Dänzer <mdaenzer@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b818a5d3

16 3月, 2022 2 次提交

drm/amdgpu: Move reset domain init before calling RREG32 · 436afdfa

由 Philip Yang 提交于 3月 15, 2022

amdgpu_detect_virtualization reads register, amdgpu_device_rreg access
adev->reset_domain->sem if kernel defined CONFIG_LOCKDEP, below is the
random boot hang backtrace on Vega10. It may get random NULL pointer
access backtrace if amdgpu_sriov_runtime is true too.

Move amdgpu_reset_create_reset_domain before calling to RREG32.

 BUG: kernel NULL pointer dereference, address:
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 Workqueue: events work_for_cpu_fn
 RIP: 0010:down_read_trylock+0x13/0xf0
 Call Trace:
  <TASK>
  amdgpu_device_skip_hw_access+0x38/0x80 [amdgpu]
  amdgpu_device_rreg+0x1b/0x170 [amdgpu]
  amdgpu_detect_virtualization+0x73/0x100 [amdgpu]
  amdgpu_device_init.cold.60+0xbe/0x16b1 [amdgpu]
  ? pci_bus_read_config_word+0x43/0x70
  amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
  amdgpu_pci_probe+0x1a1/0x3a0 [amdgpu]

Fixes: d0fb18b5 ("drm/amdgpu: Move reset sem into reset_domain")
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

436afdfa

drm/amdgpu: only check for _PR3 on dGPUs · 85ac2021

由 Alex Deucher 提交于 1月 25, 2022

We don't support runtime pm on APUs.  They support more
dynamic power savings using clock and powergating.
Reviewed-by: NMario Limonciello <mario.limonciello@amd.com>
Tested-by: NMario Limonciello <mario.limonciello@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

85ac2021

03 3月, 2022 2 次提交

drm/amdgpu: remove unused gpu_info firmwares · 1b537e64

由 Alex Deucher 提交于 3月 01, 2022

These were leftover from bring up and are no longer
necessary.  The information is available via
the IP discovery table.
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1b537e64

drm/amdgpu: move amdgpu_gmc_noretry_set after ip_versions populated · 957b0787

由 Yifan Zhang 提交于 3月 01, 2022

otherwise adev->ip_versions is still empty when amdgpu_gmc_noretry_set
is called.
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

957b0787

25 2月, 2022 1 次提交

drm/amdgpu: Exclude PCI reset method for now. · 2656fd23

由 Andrey Grodzovsky 提交于 2月 24, 2022

According to my investigation of the state of PCI
reset recently it's not working. The reason is
due to the fact the kernel PCI code rejects SBR
when there are more then one PF under same bridge
which we always have (at least AUDIO PF but usually
more) and that because SBR will reset all the PFS
and devices under the same bridge as you and you
cannot assume they support SBR.
Once we anble FLR support we can reenable this option as
FLR is doable on single PF and doens't have this
restriction.
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2656fd23

24 2月, 2022 4 次提交

drm/amdgpu: add reset register dump trace on GPU · 15fd09a0

由 Somalapuram Amaranath 提交于 2月 23, 2022

Dump the list of register values to trace event on GPU reset.
Signed-off-by: NSomalapuram Amaranath <Amaranath.Somalapuram@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

15fd09a0

drm/amdgpu: drop testing module parameter · b784f42c

由 Alex Deucher 提交于 2月 18, 2022

This test is not particularly useful now that GTT and GART
are decoupled in the driver.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b784f42c

drm/amdgpu: drop benchmark module parameter · 0b1a6348

由 Alex Deucher 提交于 2月 18, 2022

Now that we expose the benchmarks via debugfs, there is no
longer a need for the module parameter.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0b1a6348

drm/amdgpu: add a benchmark mutex · f113cc32

由 Alex Deucher 提交于 2月 18, 2022

To avoid multiple runs in parallel to avoid mixing results.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f113cc32

23 2月, 2022 1 次提交

drm/sched: Add device pointer to drm_gpu_scheduler · 8ab62eda

由 Jiawei Gu 提交于 2月 22, 2022

Add device pointer so scheduler's printing can use
DRM_DEV_ERROR() instead, which makes life easier under multiple GPU
scenario.

v2: amend all calls of drm_sched_init()
v3: fill dev pointer for all drm_sched_init() calls
Signed-off-by: NJiawei Gu <Jiawei.Gu@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220221095705.5290-1-Jiawei.Gu@amd.com

8ab62eda

18 2月, 2022 2 次提交

drm/amd: Refactor `amdgpu_aspm` to be evaluated per device · 0ab5d711

由 Mario Limonciello 提交于 2月 16, 2022

Evaluating `pcie_aspm_enabled` as part of driver probe has the implication
that if one PCIe bridge with an AMD GPU connected doesn't support ASPM
then none of them do.  This is an invalid assumption as the PCIe core will
configure ASPM for individual PCIe bridges.

Create a new helper function that can be called by individual dGPUs to
react to the `amdgpu_aspm` module parameter without having negative results
for other dGPUs on the PCIe bus.
Suggested-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NMario Limonciello <mario.limonciello@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0ab5d711

drm/amdgpu: define amdgpu_ras_late_init to call all ras blocks' .ras_late_init · 867e24ca

由 yipechai 提交于 2月 14, 2022

Define amdgpu_ras_late_init to call all ras blocks' .ras_late_init.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

867e24ca

17 2月, 2022 1 次提交

drm/amdgpu: make cyan skillfish support code more consistent · dfcc3e8c

由 Alex Deucher 提交于 2月 14, 2022

Since this is an existing asic, adjust the code to follow
the same logic as previously so the driver state is consistent.

No functional change intended.
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dfcc3e8c

15 2月, 2022 2 次提交

drm/amdgpu: Handle the GPU recovery failure in SRIOV environment. · 7258fa31

由 Surbhi Kakarya 提交于 1月 26, 2022

This patch handles the GPU recovery failure in sriov environment by
retrying the reset if the first reset fails. To determine the condition
of retry, a new macro AMDGPU_RETRY_SRIOV_RESET is added which returns
true if failure is due to ETIMEDOUT, EINVAL or EBUSY, otherwise return
false.A new macro AMDGPU_MAX_RETRY_LIMIT is used to limit the retry to 2.

It also handles the return status in Post Asic Reset by updating the return
code with asic_reset_res and eventually return the return code in
amdgpu_job_timedout().
Signed-off-by: NSurbhi Kakarya <surbhi.kakarya@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7258fa31

drm/amdgpu: Fix a kerneldoc warning · 71579346

由 Rajneesh Bhardwaj 提交于 2月 10, 2022

Add missing parameters to fix a kerneldoc warning
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

71579346

12 2月, 2022 1 次提交

drm/amdgpu: Fix htmldoc warning · c7703ce3

由 Andrey Grodzovsky 提交于 2月 11, 2022

Update function name.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220211205500.601391-1-andrey.grodzovsky@amd.com

c7703ce3

10 2月, 2022 5 次提交

drm/amdgpu: Revert 'drm/amdgpu: annotate a false positive recursive locking' · 3675c2f2

由 Andrey Grodzovsky 提交于 1月 25, 2022

Since we have a single instance of reset semaphore which we
lock only once even for XGMI hive we don't need the nested
locking hint anymore.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74120.html

3675c2f2

drm/amdgpu: Rework amdgpu_device_lock_adev · e923be99

由 Andrey Grodzovsky 提交于 1月 25, 2022

This functions needs to be split into 2 parts where
one is called only once for locking single instance of
reset_domain's sem and reset flag and the other part
which handles MP1 states should still be called for
each device in XGMI hive.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74118.html

e923be99

drm/amdgpu: Move in_gpu_reset into reset_domain · 89a7a870

由 Andrey Grodzovsky 提交于 1月 19, 2022

We should have a single instance per entrire reset domain.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74116.html

89a7a870

drm/amdgpu: Move reset sem into reset_domain · d0fb18b5

由 Andrey Grodzovsky 提交于 1月 19, 2022

We want single instance of reset sem across all
reset clients because in case of XGMI we should stop
access cross device MMIO because any of them could be
in a reset in the moment.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74117.html

d0fb18b5

drm/amdgpu: Rework reset domain to be refcounted. · cfbb6b00

由 Andrey Grodzovsky 提交于 1月 21, 2022

The reset domain contains register access semaphor
now and so needs to be present as long as each device
in a hive needs it and so it cannot be binded to XGMI
hive life cycle.
Adress this by making reset domain refcounted and pointed
by each member of the hive and the hive itself.

v4:

Fix crash on boot witrh XGMI hive by adding type to reset_domain.
XGMI will only create a new reset_domain if prevoius was of single
device type meaning it's first boot. Otherwsie it will take a
refocunt to exsiting reset_domain from the amdgou device.

Add a wrapper around reset_domain->refcount get/put
and a wrapper around send to reset wq (Lijo)
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74121.html

cfbb6b00

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功