提交 · cace4bff750ff4f55b16c3aa90aa9376d7488929 · openeuler / Kernel

14 12月, 2021 1 次提交

drm/amdgpu: check df_funcs and its callback pointers · cace4bff

由 Hawking Zhang 提交于 11月 25, 2021

in case they are not avaiable in early phase
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NLe Ma <Le.Ma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cace4bff

23 11月, 2021 1 次提交

drm/amd/amdgpu: fix potential memleak · 7b833d68

由 Bernard Zhao 提交于 11月 14, 2021

In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed
There is a potential memleak if not call kobject_put.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NBernard Zhao <bernard@vivo.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7b833d68

06 11月, 2021 1 次提交

drm/amdgpu: correct xgmi ras error count reset · 7513c9ff

由 Tao Zhou 提交于 11月 04, 2021

The error count reset for xgmi3x16 pcs is missed.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7513c9ff

27 8月, 2021 1 次提交

drm/amdgpu: Add support for RAS XGMI err query · 3c4ff2dc

由 John Clements 提交于 8月 26, 2021

Update XGMI RAS to support error query on aldebaran
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3c4ff2dc

25 8月, 2021 1 次提交

drm/amdgpu: Update RAS XGMI Error Query · f24d991b

由 John Clements 提交于 8月 24, 2021

Resolve bug querying error on unsupported ASIC
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f24d991b

19 8月, 2021 1 次提交

drm/amdgpu: get extended xgmi topology data · 44357a1b

由 Jonathan Kim 提交于 8月 03, 2021

The TA has a limit to the amount of data that can be retrieved from
GET_TOPOLOGY.  For setups that exceed this limit, the xGMI topology
needs to be re-initialized and data needs to be re-fetched from the
extended link records by setting a flag in the shared command buffer.

The number of hops and the number of links must be accumulated by the
driver. Other data points are all fetched from the first request.
Because the TA has already exceeded its link record limit, it
cannot hold bidirectional information.  Otherwise the driver would
have to do more than two fetches so the driver has to reflect the
topology information in the opposite direction.

v2: squashed with internal reviewed fix
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NHawking Zhang <hawking.zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

44357a1b

17 8月, 2021 1 次提交

drm/amd/amdgpu: remove unnecessary RAS context field · 893cf382

由 Candice Li 提交于 8月 13, 2021

Delete ras_if->name in the RAS ctx structure and remove related lines.
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

893cf382

23 7月, 2021 1 次提交

drm/amdkfd: report xgmi bandwidth between direct peers to the kfd · 3f46c4e9

由 Jonathan Kim 提交于 5月 12, 2021

Report the min/max bandwidth in megabytes to the kfd for direct
xgmi connections only.  Indirect peers will report 0 since
indirect route is unknown.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3f46c4e9

10 4月, 2021 3 次提交

drm/amdgpu: move xgmi ras functions to xgmi_ras_funcs · 52137ca8

由 Hawking Zhang 提交于 3月 18, 2021

xgmi ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. move all xgmi ras
functions to xgmi_ras_funcs so gpu driver only
initializes xgmi ras functions when it manages
xgmi ras.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52137ca8

drm/amdgpu: Convert sysfs sprintf/snprintf family to sysfs_emit · 36000c7a

由 Tian Tao 提交于 3月 24, 2021

Fix the following coccicheck warning:
drivers/gpu//drm/amd/amdgpu/amdgpu_ras.c:434:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:220:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_xgmi.c:249:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/df_v3_6.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_psp.c:2973:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:75:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:112:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:58:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:93:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_vram_mgr.c:125:9-17: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:52:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_gtt_mgr.c:71:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:140:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:164:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:186:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_device.c:208:8-16: WARNING:
use scnprintf or sprintf
drivers/gpu//drm/amd/amdgpu/amdgpu_atombios.c:1916:8-16: WARNING:
use scnprintf or sprintf
Signed-off-by: NTian Tao <tiantao6@hisilicon.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

36000c7a

drm/amd/pm: label these APIs used internally as static · c6ce68e6

由 Evan Quan 提交于 3月 19, 2021

Also drop unnecessary header file and declarations.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c6ce68e6

24 3月, 2021 2 次提交

drm/amdgpu: Reset the devices in the XGMI hive duirng probe · e3c1b071

由 shaoyunl 提交于 2月 16, 2021

In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices
without sync to each other. This could cause device hang since for XGMI configuration, all the devices
within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue
by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't
do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init
to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e3c1b071

drm/amdgpu: mask the xgmi number of hops reported from psp to kfd · 4ac5617c

由 Jonathan Kim 提交于 1月 27, 2021

The psp supplies the link type in the upper 2 bits of the psp xgmi node
information num_hops field. With a new link type, Aldebaran has these
bits set to a non-zero value (1 = xGMI3) so the KFD topology will report
the incorrect IO link weights without proper masking.
The actual number of hops is located in the 3 least significant bits of
this field so mask if off accordingly before passing it to the KFD.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NAmber Lin <amber.lin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4ac5617c

10 2月, 2021 1 次提交

drm/amdgpu: optimize list operation in amdgpu_xgmi · be8901c2

由 Kevin Wang 提交于 2月 03, 2021

simplify the list operation.
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

be8901c2

13 11月, 2020 1 次提交

drm/amdgpu: check hive pointer before access · a9f5f98f

由 Hawking Zhang 提交于 9月 05, 2020

in case it is an invalid one
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a9f5f98f

10 10月, 2020 1 次提交

drm/amdgpu: Fix inconsistent of format with argument type in amdgpu_xgmi.c · 73e34336

由 Ye Bin 提交于 10月 09, 2020

Fix follow warning:
[drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c:249]: (warning) %d in format
string (no. 1) requires 'int' but the argument type is 'unsigned int'.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

73e34336

25 8月, 2020 4 次提交

drm/amdgpu: Get DRM dev from adev by inline-f · 4a580877

由 Luben Tuikov 提交于 8月 24, 2020

Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a580877

drm/amdgpu: drm_device to amdgpu_device by inline-f (v2) · 1348969a

由 Luben Tuikov 提交于 8月 24, 2020

Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.

v2: Use a typed visible static inline function
    instead of an invisible macro.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1348969a

drm/amdgpu: refine create and release logic of hive info · d95e8e97

由 Dennis Li 提交于 8月 18, 2020

Change to dynamically create and release hive info object,
which help driver support more hives in the future.

v2:
Change to save hive object pointer in adev, to avoid locking
xgmi_mutex every time when calling amdgpu_get_xgmi_hive.

v3:
1. Change type of hive object pointer in adev from void* to
amdgpu_hive_info*.
2. remove unnecessary variable initialization.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d95e8e97

drm/amdgpu: refine codes to avoid reentering GPU recovery · 53b3f8f4

由 Dennis Li 提交于 8月 19, 2020

if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53b3f8f4

15 8月, 2020 1 次提交

drm/amdgpu: revert "fix system hang issue during GPU reset" · f1403342

由 Christian König 提交于 8月 12, 2020

The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1a.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1403342

08 8月, 2020 1 次提交

drm/amdgpu: unlock mutex on error · 94561899

由 Dennis Li 提交于 8月 04, 2020

Make sure to unlock the mutex when error happen

v2:
1. correct syntax error in the commit comments
2. remove change-Id
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

94561899

28 7月, 2020 1 次提交

drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a

由 Dennis Li 提交于 7月 08, 2020

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NChristian König <christian.koenig@amd.com>
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df9c8d1a

01 7月, 2020 1 次提交

drm/amdgpu: remove unused functions · 683fc63d

由 Nirmoy Das 提交于 6月 18, 2020

Remove unused amdgpu_xgmi_hive_try_lock() and smu7_reset_asic_tasks().
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

683fc63d

22 5月, 2020 1 次提交

drm/amdgpu fix incorrect sysfs remove behavior for xgmi · a89b5dae

由 Jack Zhang 提交于 5月 18, 2020

Under xgmi setup,some sysfs fail to create for the second time of kmd
driver loading. It's due to sysfs nodes are not removed appropriately
in the last unlod time.

Changes of this patch:
1. remove sysfs for dev_attr_xgmi_error
2. remove sysfs_link adev->dev->kobj with target name.
   And it only needs to be removed once for a xgmi setup
3. remove sysfs_link hive->kobj with target name

In amdgpu_xgmi_remove_device:
1. amdgpu_xgmi_sysfs_rem_dev_info needs to be run per device
2. amdgpu_xgmi_sysfs_destroy needs to be run on the last node of
device.

v2: initialize array with memset
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a89b5dae

15 5月, 2020 1 次提交

drm/amdgpu: remove redundant assignment to variable ret · 29c1ec24

由 Colin Ian King 提交于 5月 12, 2020

The variable ret is being initializeed with a value that is never read
and it is being updated later with a new value. The initialization
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29c1ec24

09 5月, 2020 1 次提交

drm/amdgpu: use node_id and node_size to calcualte dram_base_address · 890900fe

由 Hawking Zhang 提交于 5月 04, 2020

physical_node_id * node_segment_size should be the
dram_base_address for current gpu node in xgmi config
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

890900fe

28 4月, 2020 1 次提交

drm/amdgpu: sw pstate switch should only be for vega20 · dfe31f25

由 Jonathan Kim 提交于 4月 24, 2020

Driver steered p-state switching is designed for Vega20 only.
Also simplify early return for temporary disable due to SMU FW
bug.
Signed-off-by: NJonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dfe31f25

23 4月, 2020 1 次提交

drm/amdgpu: fix race between pstate and remote buffer map · d84a430d

由 Jonathan Kim 提交于 3月 17, 2020

Vega20 arbitrates pstate at hive level and not device level. Last peer to
remote buffer unmap could drop P-State while another process is still
remote buffer mapped.

With this fix, P-States still needs to be disabled for now as SMU bug
was discovered on synchronous P2P transfers.  This should be fixed in the
next FW update.
Signed-off-by: NJonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d84a430d

02 4月, 2020 1 次提交

drm/amdgpu: added xgmi ras error reset sequence · 66399248

由 John Clements 提交于 3月 25, 2020

added mechanism to clear xgmi ras status inbetween error queries
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

66399248

11 3月, 2020 1 次提交

drm/amdgpu: call ras_debugfs_create_all in debugfs_init · 204eaac6

由 Tao Zhou 提交于 3月 06, 2020

and remove each ras IP's own debugfs creation

this is required to fix ras when the driver does not use the drm load
and unload callbacks due to ordering issues with the drm device node.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

204eaac6

07 3月, 2020 2 次提交

drm/amdgpu: enable PCS error report on arcturus · a61f41b1

由 Hawking Zhang 提交于 2月 21, 2020

add arcturus xgmi/wafl pcs err status group to support
PCS error detection and report on arcturus
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a61f41b1

drm/amdgpu: add helper funcs to detect PCS error · 18f36157

由 Hawking Zhang 提交于 2月 21, 2020

Since from vega20, hardware supports run-time detect
and report XGMI/WAFL PCS ras error. Add helper functions
to walkthrough every type of ras error and report it if
any.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

18f36157

27 2月, 2020 2 次提交

drm/amdgpu: toggle DF-Cstate to protect DF reg access · 938065d4

由 Hawking Zhang 提交于 2月 24, 2020

driver needs to take DF out Cstate before any DF register
access. otherwise, the DF register may not be accessible.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

938065d4

drm/amdgpu: move get_xgmi_relative_phy_addr to amdgpu_xgmi.c · 19744f5f

由 Hawking Zhang 提交于 2月 24, 2020

centralize all the xgmi related function to amdgpu_xgmi.c
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

19744f5f

07 2月, 2020 1 次提交

drm/amdgpu: move xgmi init/fini to xgmi_add/remove_device call (v2) · 0b9d3760

由 Hawking Zhang 提交于 12月 23, 2019

For sriov, psp ip block has to be initialized before
ih block for the dynamic register programming interface
that needed for vf ih ring buffer. On the other hand,
current psp ip block hw_init function will initialize
xgmi session which actaully depends on interrupt to
return session context. This results an empty xgmi ta
session id and later failures on all the xgmi ta cmd
invoked from vf. xgmi ta session initialization has to
be done after ih ip block hw_init call.

to unify xgmi session init/fini for both bare-metal
sriov virtualization use scenario, move xgmi ta init
to xgmi_add_device call, and accordingly terminate xgmi
ta session in xgmi_remove_device call.

The existing suspend/resume sequence will not be changed.

v2: squash in return fix from Nirmoy
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NFrank Min <Frank.Min@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0b9d3760

14 1月, 2020 2 次提交

drm/amdgpu: Create generic DF struct in adev · bdf84a80

由 Joseph Greathouse 提交于 1月 14, 2020

The only data fabric information the adev struct currently
contains is a function pointer table. In the near future,
we will be adding some cached DF information into adev. As
such, this patch creates a new amdgpu_df struct for adev.
Right now, it only containst the old function pointer table,
but new stuff will be added soon.
Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdf84a80

drm/amd/powerplay: cover the powerplay implementation details V3 · 9530273e

由 Evan Quan 提交于 1月 07, 2020

This can save users much troubles. As they do not
actually need to care whether swSMU or traditional
powerplay routine should be used.

V2: apply the fixes to vi.c and cik.c also
V3: squash in oops fix
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9530273e

19 12月, 2019 1 次提交

drm/amdgpu: Add task barrier to XGMI hive. · f33a8770

由 Andrey Grodzovsky 提交于 12月 06, 2019

Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NLe Ma <Le.Ma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f33a8770

08 11月, 2019 1 次提交

drm/amdgpu: fix vega20 pstate status change · cb5932f8

由 Jonathan Kim 提交于 11月 06, 2019

vega20 only requires all devices be set to same pstate level for low
pstate and not high.
Signed-off-by: NJonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: NEvan Quan <Evan.Quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cb5932f8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功