提交 · fa4a427d84f9b797970a3d5139d7645403e4e989 · openeuler / Kernel

15 12月, 2021 1 次提交

drm/amdgpu: SRIOV flr_work should use down_write · fa4a427d

由 Victor Skvortsov 提交于 12月 13, 2021

Host initiated VF FLR may fail if someone else is
already holding a read_lock. Change from down_write_trylock
to down_write to guarantee the reset goes through.
Signed-off-by: NVictor Skvortsov <victor.skvortsov@amd.com>
Reviewed by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fa4a427d

14 12月, 2021 27 次提交

drm:amdgpu:remove unneeded variable · 47d9c6fa

由 chiminghao 提交于 12月 09, 2021

return value form directly instead of
taking this in another redundant variable.
Reported-by: NZeal Robot <zealci@zte.com.cm>
Signed-off-by: Nchiminghao <chi.minghao@zte.com.cn>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

47d9c6fa

drm/amdgpu: re-format file header comments · ba6f8c13

由 Isabella Basso 提交于 12月 09, 2021

Fix the warning below:

 warning: Cannot understand  * \file amdgpu_ioc32.c
 on line 2 - I thought it was a doc line

Changes since v1:
- As suggested by Alexander Deucher:
  1. Reduce diff to minimum as this DOC section doesn't provide much
     value.
Signed-off-by: NIsabella Basso <isabbasso@riseup.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ba6f8c13

drm/amdgpu: fix amdgpu_ras_mca_query_error_status scope · 929bb8e2

由 Isabella Basso 提交于 12月 09, 2021

This commit fixes the compile-time warning below:

 warning: no previous prototype for ‘amdgpu_ras_mca_query_error_status’
 [-Wmissing-prototypes]

Changes since v1:
- As suggested by Alexander Deucher:
  1. Make function static instead of adding prototype.
Signed-off-by: NIsabella Basso <isabbasso@riseup.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

929bb8e2

drm/amdgpu: Reduce SG bo memory usage for mGPUs · 28fe4164

由 Philip Yang 提交于 12月 06, 2021

For userptr bo, if adev is not in IOMMU isolation mode, RAM direct map
to GPU, multiple GPUs use same system memory dma mapping address, they
can share the original mem->bo in attachment to reduce dma address array
memory usage.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

28fe4164

drm/amdgpu: Detect if amdgpu in IOMMU direct map mode · 4a74c38c

由 Philip Yang 提交于 12月 06, 2021

If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode,
set adev->ram_is_direct_mapped flag which will be used to optimize
memory usage for multi GPU mappings.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a74c38c

drm/amdgpu: add support for SMU debug option · 6ff7fddb

由 Lang Yu 提交于 11月 30, 2021

SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.

Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.

The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:

1, smu_cmn_send_smc_msg_with_param() for normal timeout cases

  Halt on any error.

2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases

  Halt on errors apart from ETIME. Otherwise this way won't work.
  Let the user handle ETIME error in such a case.

3, smu_cmn_send_msg_without_waiting() for no waiting cases

  Halt on errors apart from ETIME. Otherwise second way won't work.

== Command Guide ==

1, enable HALT_ON_ERROR mode

 # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable HALT_ON_ERROR mode

 # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v5:
 - Use bit mask to allow more debug features.(Evan)
 - Use WRAN() instead of BUG().(Evan)

v4:
 - Set to halt state instead of a simple hang.(Christian)

v3:
 - Use debugfs_create_bool().(Christian)
 - Put variable into smu_context struct.
 - Don't resend command when timeout.

v2:
 - Resend command when timeout.(Lijo)
 - Use debugfs file instead of module parameter.
Signed-off-by: NLang Yu <lang.yu@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6ff7fddb

drm/amdgpu: introduce a kind of halt state for amdgpu device · 34f3a4a9

由 Lang Yu 提交于 12月 09, 2021

It is useful to maintain error context when debugging
SW/FW issues. Introduce amdgpu_device_halt() for this
purpose. It will bring hardware to a kind of halt state,
so that no one can touch it any more.

Compare to a simple hang, the system will keep stable
at least for SSH access. Then it should be trivial to
inspect the hardware state and see what's going on.

v2:
 - Set adev->no_hw_access earlier to avoid potential crashes.(Christian)
Suggested-by: NChristian Koenig <christian.koenig@amd.com>
Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NLang Yu <lang.yu@amd.com>
Reviewed-by: NChristian Koenig <christian.koenig@amd.co>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

34f3a4a9

drm/amdgpu: check df_funcs and its callback pointers · cace4bff

由 Hawking Zhang 提交于 11月 25, 2021

in case they are not avaiable in early phase
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NLe Ma <Le.Ma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cace4bff

drm/amdgpu: don't override default ECO_BITs setting · 4ac955ba

由 Hawking Zhang 提交于 12月 04, 2021

Leave this bit as hardware default setting
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4ac955ba

drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORE · 2c113b99

由 Le Ma 提交于 12月 04, 2021

should count on GC IP base address
Signed-off-by: NLe Ma <le.ma@amd.com>
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2c113b99

drm/amdgpu: read and authenticate ip discovery binary · 2cb6577a

由 Hawking Zhang 提交于 11月 22, 2021

read and authenticate ip discovery binary getting from
vram first, if it is not valid, read and authenticate
the one getting from file
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2cb6577a

drm/amdgpu: add helper to verify ip discovery binary signature · 32f0e1a3

由 Hawking Zhang 提交于 11月 22, 2021

To be used to check ip discovery binary signature
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

32f0e1a3

drm/amdgpu: rename discovery_read_binary helper · f6dcaf0c

由 Hawking Zhang 提交于 11月 22, 2021

add _from_vram in the funciton name to diffrentiate
the one used to read from file
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f6dcaf0c

drm/amdgpu: add helper to load ip_discovery binary from file · 43a80bd5

由 Hawking Zhang 提交于 11月 23, 2021

To be used when ip_discovery binary is not carried by vbios
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

43a80bd5

drm/amdgpu: fix incorrect VCN revision in SRIOV · c40bdfb2

由 Leslie Shi 提交于 12月 08, 2021

Guest OS will setup VCN instance 1 which is disabled as an enabled instance and
execute initialization work on it, but this causes VCN ib ring test failure
on the disabled VCN instance during modprobe:

amdgpu 0000:00:08.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 5 on hub 1
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_dec_0 (-110).
amdgpu 0000:00:08.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on vcn_enc_0.0 (-110).
[drm:amdgpu_device_delayed_init_work_handler [amdgpu]] *ERROR* ib ring test failed (-110).

v2: drop amdgpu_discovery_get_vcn_version and rename sriov_config to
vcn_config
v3: modify VCN's revision in SR-IOV and bare-metal

Fixes: baf3f8f3 ("drm/amdgpu: handle SRIOV VCN revision parsing")
Signed-off-by: NLeslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c40bdfb2

drm/amdgpu: add modifiers in amdgpu_vkms_plane_init() · 4046afce

由 Leslie Shi 提交于 12月 06, 2021

Fix following warning in SRIOV during modprobe:

amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier
WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu]
Signed-off-by: NLeslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4046afce

drm/amdgpu: only hw fini SMU fisrt for ASICs need that · 613aa3ea

由 Lang Yu 提交于 12月 03, 2021

We found some headaches on ASICs don't need that,
so remove that for them.
Suggested-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NLang Yu <lang.yu@amd.com>
Reviewed-by: NKevin Wang <kevinyang.wang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

613aa3ea

drm/amdgpu: Handle fault with same timestamp · 0771c805

由 Philip Yang 提交于 12月 08, 2021

Remove not unique timestamp WARNING as same timestamp interrupt happens
on some chips,

Drain fault need to wait for the processed_timestamp to be truly greater
than the checkpoint or the ring to be empty to be sure no stale faults
are handled.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1818Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0771c805

drm/amdgpu: fix location of prototype for amdgpu_kms_compat_ioctl · e105b64a

由 Isabella Basso 提交于 12月 07, 2021

This fixes the warning below by changing the prototype to a location
that's actually included by the .c files that call
amdgpu_kms_compat_ioctl:

 warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’
 [-Wmissing-prototypes]
 37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
    |      ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: NIsabella Basso <isabbasso@riseup.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e105b64a

drm/amd: append missing includes · 64cf26f0

由 Isabella Basso 提交于 12月 07, 2021

This fixes warnings caused by global functions lacking prototypes:, such as:

 warning: no previous prototype for 'dcn303_hw_sequencer_construct'
 [-Wmissing-prototypes]
 12 | void dcn303_hw_sequencer_construct(struct dc *dc)
    |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 ...
 warning: no previous prototype for ‘amdgpu_has_atpx’
 [-Wmissing-prototypes]
 76 | bool amdgpu_has_atpx(void) {
    |      ^~~~~~~~~~~~~~~
Reviewed-by: NRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: NIsabella Basso <isabbasso@riseup.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

64cf26f0

drm/amdgpu: fix function scopes · 2351b7d4

由 Isabella Basso 提交于 12月 07, 2021

This turns previously global functions into static, thus removing
compile-time warnings such as:

 warning: no previous prototype for 'amdgpu_vkms_output_init' [-Wmissing-prototypes]
 399 | int amdgpu_vkms_output_init(struct drm_device *dev,
     |     ^~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: NIsabella Basso <isabbasso@riseup.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2351b7d4

drm/amd: fix improper docstring syntax · bbe04dec

由 Isabella Basso 提交于 12月 07, 2021

This fixes various warnings relating to erroneous docstring syntax, of
which some are listed below:

 warning: Function parameter or member 'adev' not described in
 'amdgpu_atomfirmware_ras_rom_addr'
 ...
 warning: expecting prototype for amdgpu_atpx_validate_functions().
 Prototype was for amdgpu_atpx_validate() instead
 ...
 warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new'
 ...
 warning: Cannot understand  * @kfd_get_cu_occupancy - Collect number of
 waves in-flight on this device
 ...
 warning: This comment starts with '/**', but isn't a kernel-doc
 comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: NIsabella Basso <isabbasso@riseup.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bbe04dec

drm/amdgpu: extended waiting SRIOV VF reset completion timeout to 10s · 85a774d9

由 Zhigang Luo 提交于 12月 06, 2021

For the ASIC has big FB, it need more time to clear FB during reset.
This change extended SRIOV VF waiting reset completion timeout from 5s
to 10s.
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com>
Acked-by: NShaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

85a774d9

drm/amdgpu: recover XGMI topology for SRIOV VF after reset · a5f67c93

由 Zhigang Luo 提交于 12月 06, 2021

For SRIOV VF, the XGMI topology was not recovered after reset. This
change added code to SRIOV VF reset function to update XGMI topology
for SRIOV VF after reset.
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com>
Reviewed-by: NShaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a5f67c93

drm/amdgpu: added PSP XGMI initialization for SRIOV VF during recover · dd26e018

由 Zhigang Luo 提交于 12月 06, 2021

For SRIOV VF, XGMI was not initialized in PSP during recover. This change
added PSP XGMI initialization for SRIOV VF during recover.
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com>
Reviewed-by: NShaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd26e018

drm/amdgpu: skip reset other device in the same hive if it's SRIOV VF · 175ac6ec

由 Zhigang Luo 提交于 11月 26, 2021

On SRIOV, host driver can support FLR(function level reset) on individual VF
within the hive which might bring the individual device back to normal without
the necessary to execute the hive reset. If the FLR failed , host driver will
trigger the hive reset, each guest VF will get reset notification before the
real hive reset been executed. The VF device can handle the reset request
individually in it's reset work handler.

This change updated gpu recover sequence to skip reset other device in
the same hive for SRIOV VF.
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com>
Reviewed-by: NShaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

175ac6ec

drm/amdgpu: enable RAS poison flag when GPU is connected to CPU · 655ff353

由 Tao Zhou 提交于 12月 08, 2021

The RAS poison mode is enabled by default on the platform.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

655ff353

08 12月, 2021 6 次提交

drm/amdgpu: replace drm_detect_hdmi_monitor() with drm_display_info.is_hdmi · 3c021931

由 Claudio Suarez 提交于 10月 17, 2021

Once EDID is parsed, the monitor HDMI support information is available
through drm_display_info.is_hdmi. The amdgpu driver still calls
drm_detect_hdmi_monitor() to retrieve the same information, which
is less efficient. Change to drm_display_info.is_hdmi

This is a TODO task in Documentation/gpu/todo.rst
Reviewed-by: NHarry Wentland <harry.wentland@amd.com>
Signed-off-by: NClaudio Suarez <cssk@net-c.es>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3c021931

drm/amdgpu: update drm_display_info correctly when the edid is read · 20543be9

由 Claudio Suarez 提交于 10月 17, 2021

drm_display_info is updated by drm_get_edid() or
drm_connector_update_edid_property(). In the amdgpu driver it is almost
always updated when the edid is read in amdgpu_connector_get_edid(),
but not always.  Change amdgpu_connector_get_edid() and
amdgpu_connector_free_edid() to keep drm_display_info updated.
Reviewed-by: NHarry Wentland <harry.wentland@amd.com>
Signed-off-by: NClaudio Suarez <cssk@net-c.es>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

20543be9

drm/amdgpu: skip umc ras error count harvest · cf63b702

由 Stanley.Yang 提交于 12月 07, 2021

remove in recovery stat check, skip umc ras err cnt
harvest in amdgpu_ras_log_on_err_counter
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cf63b702

drm/amdgpu: free vkms_output after use · 30c1e391

由 Flora Cui 提交于 12月 02, 2021

Signed-off-by: NFlora Cui <flora.cui@amd.com>
Reviewed-by: NLeslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

30c1e391

drm/amdgpu: drop the critial WARN_ON in amdgpu_vkms · f7ed3f90

由 Flora Cui 提交于 11月 24, 2021

Signed-off-by: NFlora Cui <flora.cui@amd.com>
Reviewed-by: NLeslie Shi <Yuliang.Shi@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f7ed3f90

drm/amdgpu: only skip get ecc info for aldebaran · aed1faab

由 Stanley.Yang 提交于 12月 03, 2021

skip get ecc info for aldebarn through check ip version
do not affect other asic type
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

aed1faab

03 12月, 2021 3 次提交

drm/amdgpu: Fix a NULL pointer dereference in amdgpu_connector_lcd_native_mode() · b220110e

由 Zhou Qingyang 提交于 12月 03, 2021

In amdgpu_connector_lcd_native_mode(), the return value of
drm_mode_duplicate() is assigned to mode, and there is a dereference
of it in amdgpu_connector_lcd_native_mode(), which will lead to a NULL
pointer dereference on failure of drm_mode_duplicate().

Fix this bug add a check of mode.

This bug was found by a static analyzer. The analysis employs
differential checking to identify inconsistent security operations
(e.g., checks or kfrees) between two code paths and confirms that the
inconsistent operations are not recovered in the current function or
the callers, so they constitute bugs.

Note that, as a bug found by static analysis, it can be a false
positive or hard to trigger. Multiple researchers have cross-reviewed
the bug.

Builds with CONFIG_DRM_AMDGPU=m show no new warnings, and
our static analyzer no longer warns about this code.

Fixes: d38ceaf9 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: NZhou Qingyang <zhou1615@umn.edu>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b220110e

drm/amdgpu: handle SRIOV VCN revision parsing · baf3f8f3

由 Alex Deucher 提交于 11月 30, 2021

For SR-IOV, the IP discovery revision number encodes
additional information.  Handle that case here.

v2: drop additional IP versions
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

baf3f8f3

drm/amdgpu: skip query ecc info in gpu recovery · bab73f09

由 Stanley.Yang 提交于 12月 02, 2021

this is a workaround due to get ecc info failed during gpu recovery

[  700.236122] amdgpu 0000:09:00.0: amdgpu: Failed to export SMU ecc table!
[  700.236128] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.331171] amdgpu: qcm fence wait loop timeout expired
[  704.331194] amdgpu: The cp might be in an unrecoverable state due to an unsuccessful queues preemption
[  704.332445] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[  704.332448] amdgpu 0000:09:00.0: amdgpu: Bailing on TDR for s_job:ffffffffffffffff, as another already in progress
[  704.332456] amdgpu: Pasid 0x8000 destroy queue 0 failed, ret -62
[  710.360924] amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000013 SMN_C2PMSG_82:0x00000007
[  710.360964] amdgpu 0000:09:00.0: amdgpu: Failed to disable smu features.
[  710.361002] amdgpu 0000:09:00.0: amdgpu: Fail to disable dpm features!
[  710.361014] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bab73f09

02 12月, 2021 3 次提交

drm/amdgpu: update fw_load_type module parameter doc to match code · ddb267b6

由 Yann Dirson 提交于 11月 29, 2021

amdgpu_ucode_get_load_type() does not interpret this parameter as
documented.  It is ignored for many ASIC types (which presumably
only support one load_type), and when not ignored it is only used
to force direct loading instead of PSP loading.  SMU loading is
only available for ASICs for which the parameter is ignored.
Signed-off-by: NYann Dirson <ydirson@free.fr>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ddb267b6

drm/amdkfd: err_pin_bo path leaks kfd_bo_list · a899fe8b

由 Philip Yang 提交于 11月 29, 2021

Refactor userptr and pin_bo path to make it less confusing, move
err_pin_bo label up to remove mem from process_info kfd_bo_list.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a899fe8b

drm/amdgpu: adjust the kfd reset sequence in reset sriov function · 992110d7

由 shaoyunl 提交于 11月 29, 2021

This change revert previous commits:
9f4f2c1a ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
271fd38c ("drm/amdgpu: move kfd post_reset out of reset_sriov function")

This change moves the amdgpu_amdkfd_pre_reset to an earlier place
in amdgpu_device_reset_sriov, presumably to address the sequence issue
that the first patch was originally meant to fix.

Some register access(GRBM_GFX_CNTL) only be allowed on full access
mode. Move kfd_pre_reset and  kfd_post_reset back inside reset_sriov
function.

Fixes: 9f4f2c1a ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
Fixes: 271fd38c ("drm/amdgpu: move kfd post_reset out of reset_sriov function")
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

992110d7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功