提交 · 53b3f8f40e6cff36ae12b11e6d6b308af3c7e53f · openeuler / Kernel

25 8月, 2020 1 次提交

drm/amdgpu: refine codes to avoid reentering GPU recovery · 53b3f8f4

由 Dennis Li 提交于 8月 19, 2020

if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53b3f8f4

19 8月, 2020 2 次提交

drm/amdgpu/pm: only hide average power on SI and pre-RENOIR APUs · 367deb67

由 Alex Deucher 提交于 8月 17, 2020

We can get this on RENOIR and newer via the SMU metrics
table.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

367deb67

drm/amdgpu/pm: remove duplicate check · d0eb1b5c

由 Alex Deucher 提交于 8月 17, 2020

FAMILY_KV is APUs and we already check for APUs.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d0eb1b5c

15 8月, 2020 3 次提交

drm/amd/pm: optimize the power related source code layout · e098bc96

由 Evan Quan 提交于 8月 13, 2020

The target is to provide a clear entry point(for power routines).
Also this can help to maintain a clear view about the frameworks
used on different ASICs. Hopefully all these can make power part
more friendly to play with.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e098bc96

drm/amd/powerplay: put those exposed power interfaces in amdgpu_dpm.c · e9372d23

由 Evan Quan 提交于 8月 13, 2020

As other power interfaces.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e9372d23

drm/amdgpu: revert "fix system hang issue during GPU reset" · f1403342

由 Christian König 提交于 8月 12, 2020

The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1a.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1403342

07 8月, 2020 1 次提交

drm/amd/powerplay: add new sysfs interface for retrieving gpu metrics(V2) · 25c933b1

由 Evan Quan 提交于 7月 23, 2020

A new interface for UMD to retrieve gpu metrics data.

V2: rich the documentation
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

25c933b1

31 7月, 2020 2 次提交

Revert "drm/amdgpu: Fix NULL dereference in dpm sysfs handlers" · 2456c290

由 Alex Deucher 提交于 7月 30, 2020

This regressed some working configurations so revert it.  Will
fix this properly for 5.9 and backport then.

This reverts commit 38e0c89a.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

2456c290

drm/amdgpu: skip crit temperature values on APU (v2) · 35dab589

由 Huang Rui 提交于 7月 27, 2020

It doesn't expose PPTable descriptor on APU platform. So max/min
temperature values cannot be got from APU platform.

v2: Stoney needs to skip crit temperature as well.
Signed-off-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

35dab589

28 7月, 2020 2 次提交

drm/amd/powerplay: revise the outputs layout of amdgpu_pm_info debugfs · 81b41ff5

由 Evan Quan 提交于 7月 14, 2020

The current outputs of amdgpu_pm_info debugfs come with clock gating
status and followed by current clock/power information. However the
clock gating status retrieving may pull GFX out of CG status. That
will make the succeeding clock/power information retrieving inaccurate.

To overcome this and be with minimum impact, the outputs are updated
to show current clock/power information first.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

81b41ff5

drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a

由 Dennis Li 提交于 7月 08, 2020

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NChristian König <christian.koenig@amd.com>
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df9c8d1a

23 7月, 2020 1 次提交

drm/amdgpu/powerplay: add some documentation about memory clock · ccda42a4

由 Alex Deucher 提交于 7月 16, 2020

We expose the actual memory controller clock rate in Linux,
not the effective memory clock of the DRAMs.  To translate
it, it follows the following formula:

Clock conversion (Mhz):
HBM: effective_memory_clock = memory_controller_clock * 1
G5:  effective_memory_clock = memory_controller_clock * 1
G6:  effective_memory_clock = memory_controller_clock * 2

DRAM data rate (MT/s):
HBM: effective_memory_clock * 2 = data_rate
G5:  effective_memory_clock * 4 = data_rate
G6:  effective_memory_clock * 8 = data_rate

Bandwidth (MB/s):
data_rate * vram_bit_width / 8 = memory_bandwidth

Some examples:
G5 on RX460:
memory_controller_clock = 1750 Mhz
effective_memory_clock = 1750 Mhz * 1 = 1750 Mhz
data rate = 1750 * 4 = 7000 MT/s
memory_bandwidth = 7000 * 128 bits / 8 = 112000 MB/s

G6 on RX5600:
memory_controller_clock = 900 Mhz
effective_memory_clock = 900 Mhz * 2 = 1800 Mhz
data rate = 1800 * 8 = 14400 MT/s
memory_bandwidth = 14400 * 192 bits / 8 = 345600 MB/s
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ccda42a4

22 7月, 2020 1 次提交

drm/amdgpu: Fix NULL dereference in dpm sysfs handlers · 9cb26821

由 Paweł Gronowski 提交于 7月 19, 2020

NULL dereference occurs when string that is not ended with space or
newline is written to some dpm sysfs interface (for example pp_dpm_sclk).
This happens because strsep replaces the tmp with NULL if the delimiter
is not present in string, which is then dereferenced by tmp[0].

Reproduction example:
sudo sh -c 'echo -n 1 > /sys/class/drm/card0/device/pp_dpm_sclk'
Signed-off-by: NPaweł Gronowski <me@woland.xyz>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9cb26821

16 7月, 2020 1 次提交

drm/amd/powerplay: drop unused APIs and parameters · 42f75c84

由 Evan Quan 提交于 7月 02, 2020

Leftover of previous performance level setting cleanups.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

42f75c84

08 7月, 2020 1 次提交

drm/amdgpu: Move the mutex lock/unlock out · fa4a8820

由 Alex Jivin 提交于 7月 06, 2020

Move the mutext lock/unlock outside of the if(),
as the mutex is always taken: either in the if()
branch or in the else branch.
Signed-off-by: NAlex Jivin <alex.jivin@amd.com>
Suggested-By: NLuben Tukov <luben.tuikov@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fa4a8820

03 7月, 2020 2 次提交

drm/amdgpu: use %u rather than %d for sclk/mclk · 2a80f883

由 Alex Deucher 提交于 7月 01, 2020

Large clock values may overflow and show up as negative.

Reported by prOMiNd on IRC.
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2a80f883

drm/amdgpu: SI support for UVD and VCE power managment · a71a4f50

由 Alex Jivin 提交于 6月 24, 2020

Port functionality from the Radeon driver to support
UVD and VCE power management.
Signed-off-by: NAlex Jivin <alex.jivin@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a71a4f50

01 7月, 2020 6 次提交

drm/amdgpu: remove perf level dpm in one-VF · a2e6ad19

由 Wenhui Sheng 提交于 6月 12, 2020

On Navi12 platform, node power_dpm_force_performance_level
doesn't work correctly in one-VF mode with at least three
smu messages not supported:
SMU_MSG_SetSoftMaxByFreq
SMU_MSG_SetSoftMinByFreq
SMU_MSG_TransferTableDram2Smu
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Signed-off-by: NWenhui Sheng <Wenhui.Sheng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2e6ad19

drm/amdgpu: remove redundant initialization of variable ret · 7c8e0835

由 Colin Ian King 提交于 6月 18, 2020

The variable ret is being initialized with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7c8e0835

drm/amdgpu/pm: fix ref count leak when pm_runtime_get_sync fails · 66429300

由 Alex Deucher 提交于 6月 17, 2020

The call to pm_runtime_get_sync increments the counter even in case of
failure, leading to incorrect ref count.
In case of failure, decrement the ref count before returning.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

66429300

drm/amdgpu: fix documentation around busy_percentage · f503fe69

由 Alex Deucher 提交于 6月 15, 2020

Add rename the gpu busy percentage for consistency and
add the mem busy percentage documentation.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f503fe69

drm/amdgpu/pm: update comment to clarify Overdrive interfaces · bd09331a

由 Alex Deucher 提交于 6月 15, 2020

Vega10 and previous asics use one interface, vega20 and newer
use another.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bd09331a

drm/amd/powerplay: drop unused code around power limit · 4cb738ab

由 Evan Quan 提交于 6月 08, 2020

Drop unused APIs, variables and argument.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4cb738ab

18 6月, 2020 2 次提交

drm/amdgpu: fix documentation around busy_percentage · da9cebe1

由 Alex Deucher 提交于 6月 15, 2020

Add rename the gpu busy percentage for consistency and
add the mem busy percentage documentation.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

da9cebe1

drm/amdgpu/pm: update comment to clarify Overdrive interfaces · 7386f5c9

由 Alex Deucher 提交于 6月 15, 2020

Vega10 and previous asics use one interface, vega20 and newer
use another.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7386f5c9

03 6月, 2020 1 次提交

drm/amdgpu: Add unique_id and serial_number for Arcturus v3 · 81a16241

由 Kent Russell 提交于 4月 27, 2020

Add support for unique_id and serial_number, as these are now
the same value, and will be for future ASICs as well.

v2: Explicitly create unique_id only for VG10/20/ARC
v3: Change set_unique_id to get_unique_id for clarity
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

81a16241

30 5月, 2020 3 次提交

drm/amdgpu: added a sysfs interface for thermal throttling related V4 · b265bdbd

由 Evan Quan 提交于 5月 22, 2020

User can check and set the enablement of throttling logging and
the interval between each logging.

V2: simplify the sysfs interface(no string parsing)
V3: add proper lock protection on updating throttling_logging_rs.interval
V4: documentation cosmetic per Luben's suggestion
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b265bdbd

drm/amdgpu/pm: return an error during GPU reset or suspend (v2) · 9271dfd9

由 Alex Deucher 提交于 5月 24, 2020

Return an error for sysfs and debugfs power interfaces during
gpu reset and suspend.  Prevents access to the hw while it may
be in an unusable state.

v2: squash in fix to drop suspend check
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9271dfd9

drm/amdgpu/pm: return an error during GPU reset or suspend (v2) · 48b270bb

由 Alex Deucher 提交于 5月 24, 2020

Return an error for sysfs and debugfs power interfaces during
gpu reset and suspend.  Prevents access to the hw while it may
be in an unusable state.

v2: squash in fix to drop suspend check
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

48b270bb

27 5月, 2020 1 次提交

drm/amdgpu: fix device attribute node create failed with multi gpu · ba02fd6b

由 Kevin Wang 提交于 5月 22, 2020

the origin design will use varible of "attr->states" to save node
supported states on current gpu device, but for multi gpu device, when
probe second gpu device, the driver will check attribute node states
from previous gpu device wthether to create attribute node.
it will cause other gpu device create attribute node faild.

1. add member attr_list into amdgpu_device to link supported device attribute node.
2. add new structure "struct amdgpu_device_attr_entry{}" to track device attribute state.
3. drop member "states" from amdgpu_device_attr.

v2:
1. move "attr_list" into amdgpu_pm and rename to "pm_attr_list".
2. refine create & remove device node functions parameter.

fix:
drm/amdgpu: optimize amdgpu device attribute code
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ba02fd6b

23 5月, 2020 3 次提交

drm/amdgpu: add apu flags (v2) · 54f78a76

由 Alex Deucher 提交于 5月 15, 2020

Add some APU flags to simplify handling of different APU
variants.  It's easier to understand the special cases
if we use names flags rather than checking device ids and
silicon revisions.

v2: rebase on latest code
Acked-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

54f78a76

drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven · cbd2d08c

由 chen gong 提交于 5月 21, 2020

[Problem description]
1. Boot up picasso platform, launches desktop, Don't do anything (APU enter into "gfxoff" state)
2. Remote login to platform using SSH, then type the command line:
sudo su -c "echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level"
sudo su -c "echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk" (fix SCLK to 1400MHz)
3. Move the mouse around in Window
4. Phenomenon : The screen frozen

Tester will switch sclk level during glmark2 run time.
APU will enter "gfxoff" state intermittently during glmark2 run time.
The system got hanged if fix GFXCLK to 1400MHz when APU is in "gfxoff"
state.

[Debug]
1. Fix SCLK to X MHz
1400: screen frozen, screen black, then OS will reboot.
1300: screen frozen.
1200: screen frozen, screen black.
1100: screen frozen, screen black, then OS will reboot.
1000: screen frozen, screen black.
900: screen frozen, screen black, then OS will reboot.
800: Situation Nomal, issue disappear.
700: Situation Nomal, issue disappear.
2. SBIOS setting: AMD CBS --> SMU Debug Options -->SMU Debug --> "GFX DLDO Psm Margin Control":
50 : Situation Nomal, issue disappear.
45 : Situation Nomal, issue disappear.
40 : Situation Nomal, issue disappear.
35 : Situation Nomal, issue disappear.
30 : screen black.
25 : screen frozen, then blurred screen.
20 : screen frozen.
15 : screen black.
10 : screen frozen.
5 : screen frozen, then blurred screen.
3. Disable GFXOFF feature
Situation Nomal, issue disappear.

[Why]
Through a period of time debugging with Sys Eng team and SMU team, Sys
Eng team said this is voltage/frequency marginal issue not a F/W or H/W
bug. This experiment proves that default targetPsm [for f=1400MHz] is
not sufficient when GFXOFF is enabled on Picasso.

SMU team think it is an odd test conditions to force sclk="1400MHz" when
GPU is in "gfxoff" state，then wake up the GFX. SCLK should be in the
"lowest frequency" when gfxoff.

[How]
Disable gfxoff when setting manual mode.
Enable gfxoff when setting other mode(exiting manual mode) again.

By the way, from the user point of view, now that user switch to manual
mode and force SCLK Frequency, he don't want SCLK be controlled by
workload.It becomes meaningless to "switch to manual mode" if APU enter "gfxoff"
due to lack of workload at this point.

Tips: Same issue observed on Raven.
Signed-off-by: Nchen gong <curry.gong@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cbd2d08c

drm/amdgpu: fix pm sysfs node handling (v2) · d5c8ffb9

由 Alex Deucher 提交于 5月 21, 2020

Fix typos that prevented them from showing up.

v2: switch other files in addition to pp_clk_voltage

Fixes: 4e01847c ("drm/amdgpu: optimize amdgpu device attribute code")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1150Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>

d5c8ffb9

22 5月, 2020 3 次提交

drm/amdgpu: improve error handling in pcie_bw · d08d692e

由 Alex Deucher 提交于 5月 19, 2020

1. Initialize the counters to 0 in case the callback
   fails to initialize them.
2. The counters don't exist on APUs so return an error
   for them.
3. Return an error if the callback doesn't exist.
Reviewed-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-By: NKent Russell <kent.russell@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d08d692e

drm/amdgpu: off by one in amdgpu_device_attr_create_groups() error handling · 62cc895c

由 Dan Carpenter 提交于 5月 20, 2020

This loop in the error handling code should start a "i - 1" and end at
"i == 0".  Currently it starts a "i" and ends at "i == 1".  The result
is that it removes one attribute that wasn't created yet, and leaks the
zeroeth attribute.

Fixes: 4e01847c ("drm/amdgpu: optimize amdgpu device attribute code")
Acked-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

62cc895c

drm/amdgpu: cleanup unnecessary virt sriov check in amdgpu attribute · 9f76f7e8

由 Kevin Wang 提交于 5月 07, 2020

the amdgpu device attribute node will be created accordding to sriov vf
mode at runtime.
cleanup unnecessary sriov check in attribute operation function.
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9f76f7e8

18 5月, 2020 1 次提交

drm/amdgpu: optimize amdgpu device attribute code · 4e01847c

由 Kevin Wang 提交于 4月 27, 2020

unified amdgpu device attribute node functions:
1. add some helper functions to create amdgpu device attribute node.
2. create device node according to device attr flags on different VF mode.
3. rename some functions name to adapt a new interface.

v2:
1. remove ATTR_STATE_DEAD, ATTR_STATE_ALIVE enum.
2. rename callback function perform to attr_update.
3. modify some variable names
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4e01847c

24 4月, 2020 2 次提交

drm/amdgpu: skip sysfs node not belong to one vf mode · 8efd7275

由 Monk Liu 提交于 4月 22, 2020

Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Acked-by: NYintian Tao <yttao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8efd7275

drm/amdgpu: Init data to avoid oops while reading pp_num_states. · 6f81b2d0

由 limingyu 提交于 4月 22, 2020

For chip like CHIP_OLAND with si enabled(amdgpu.si_support=1),
the amdgpu will expose pp_num_states to the /sys directory.
In this moment, read the pp_num_states file will excute the
amdgpu_get_pp_num_states func. In our case, the data hasn't
been initialized, so the kernel will access some ilegal
address, trigger the segmentfault and system will reboot soon:

    uos@uos-PC:~$ cat /sys/devices/pci0000\:00/0000\:00\:00.0/0000\:01\:00
    .0/pp_num_states

    Message from syslogd@uos-PC at Apr 22 09:26:20 ...
     kernel:[   82.154129] Internal error: Oops: 96000004 [#1] SMP

This patch aims to fix this problem, avoid that reading file
triggers the kernel sementfault.
Signed-off-by: Nlimingyu <limingyu@uniontech.com>
Signed-off-by: Nzhoubinbin <zhoubinbin@uniontech.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6f81b2d0

09 4月, 2020 1 次提交

drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event · 2a20e630

由 Aaron Ma 提交于 4月 03, 2020

On ARCTURUS and RENOIR, powerplay is not supported yet.
When plug in or unplug power jack, ACPI event will issue.
Then kernel NULL pointer BUG will be triggered.
Check for NULL pointers before calling.
Signed-off-by: NAaron Ma <aaron.ma@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2a20e630

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功