提交 · 42e5fee65e918f16b178ea242b6a96234411cc53 · openeuler / Kernel

27 2月, 2020 8 次提交

drm/amdgpu: add VM update fences back to the root PD v2 · 42e5fee6

由 Christian König 提交于 2月 19, 2020

Add update fences to the root PD while mapping BOs.

Otherwise PDs freed during the mapping won't wait for
updates to finish and can cause corruptions.

v2: rebased on drm-misc-next
Signed-off-by: NChristian König <christian.koenig@amd.com>
Fixes: 90b69cdc drm/amdgpu: stop adding VM updates fences to the resv obj
Reviewed-by: Nxinhui pan <xinhui.pan@amd.com>
Tested-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

42e5fee6

drm/amdgpu: cleanup amdgpu_ring_fini · 6f9f9604

由 Nirmoy Das 提交于 2月 25, 2020

cleanup amdgpu_ring_fini to check the prerequisites before changing ring->sched.ready
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6f9f9604

drm/amdgpu: Add Arcturus D342 page retire support · ef1caf48

由 John Clements 提交于 2月 25, 2020

Check Arcturus SKU type to select I2C address of page retirement EEPROM
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ef1caf48

drm/amdgpu: toggle DF-Cstate to protect DF reg access · 938065d4

由 Hawking Zhang 提交于 2月 24, 2020

driver needs to take DF out Cstate before any DF register
access. otherwise, the DF register may not be accessible.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

938065d4

drm/amdgpu: move get_xgmi_relative_phy_addr to amdgpu_xgmi.c · 19744f5f

由 Hawking Zhang 提交于 2月 24, 2020

centralize all the xgmi related function to amdgpu_xgmi.c
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

19744f5f

drm/amdgpu: add dpm helper function for DF Cstate control · 53e0f1e6

由 Hawking Zhang 提交于 2月 24, 2020

The helper function hides software smu and legacy powerplay
implementation for DF Cstate control.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53e0f1e6

drm/amdgpu: update psp firmwares loading sequence V2 · 995da6cc

由 Evan Quan 提交于 2月 24, 2020

For those ASICs with DF Cstate management centralized to PMFW,
TMR setup should be performed between pmfw loading and other
non-psp firmwares loading.

V2: skip possible SMU firmware reloading
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

995da6cc

drm/amdgpu: Remove kfd eviction fence before release bo (v2) · f4a3c42b

由 xinhui pan 提交于 2月 11, 2020

No need to trigger eviction as the memory mapping will not be used
anymore.

All pt/pd bos share same resv, hence the same shared eviction fence.
Everytime page table is freed, the fence will be signled and that cuases
kfd unexcepted evictions.

v2: squash in 32 bit fix

CC: Christian König <christian.koenig@amd.com>
CC: Felix Kuehling <felix.kuehling@amd.com>
CC: Alex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f4a3c42b

26 2月, 2020 6 次提交

drm/amdgpu: Improve Vega20 XGMI TLB flush workaround · b80cd524

由 Felix Kuehling 提交于 1月 17, 2020

Using a heavy-weight TLB flush once is not sufficient. Concurrent
memory accesses in the same TLB cache line can re-populate TLB entries
from stale texture cache (TC) entries while the heavy-weight TLB
flush is in progress. To fix this race condition, perform another TLB
flush after the heavy-weight one, when TC is known to be clean.

Move the workaround into the low-level TLB flushing functions. This way
they apply to amdgpu as well, and KIQ-based TLB flush only needs to
synchronize once.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Nshaoyun liu <shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b80cd524

drm/amdgpu: fix psp ucode not loaded in bare-metal · 82c4ebfa

由 Monk Liu 提交于 2月 21, 2020

for bare-metal we alawys need to load sys/sos/kdb
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

82c4ebfa

amdgpu/gmc_v9: save/restore sdpif regs during S3 · c2ecd79b

由 Shirish S 提交于 1月 27, 2020

fixes S3 issue with IOMMU + S/G  enabled @ 64M VRAM.
Suggested-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NShirish S <shirish.s@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c2ecd79b

drm/amdgpu/discovery: make the discovery code less chatty · 91aeda18

由 Alex Deucher 提交于 2月 19, 2020

Make the IP block base output debug only.
Reviewed-by: NXiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

91aeda18

drm/amdgpu: fix colliding of preemption · 6325b38d

由 Monk Liu 提交于 2月 06, 2020

what:
some os preemption path is messed up with world switch preemption

fix:
cleanup those logics so os preemption not mixed with world switch

this patch is a general fix for issues comes from SRIOV MCBP, but
there is still UMD side issues not resovlved yet, so this patch
cannot fix all world switch bug.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Acked-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6325b38d

drm/amdgpu: cleanup some incorrect reg access for SRIOV · f77a9c92

由 Monk Liu 提交于 1月 23, 2020

1)
we shouldn't load PSP kdb and sys/sos for VF, they are
supposed to be handled by hypervisor

2)
ih reroute doesn't work on VF thus we should avoid calling
it, besides VF should not use those PSP register sets for PF

3)
shouldn't load SMU ucode under SRIOV, otherwise PSP would report
error
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f77a9c92

19 2月, 2020 4 次提交

drm/amdgpu: drop the non-sense firmware version check on arcturus · 14008574

由 Evan Quan 提交于 2月 17, 2020

As the firmware versions of arcturus are different from other gfx9
ASICs. And the warning("CP firmware version too old, please update!")
caused by this check can be eliminated.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

14008574

drm/amdgpu: add is_raven_kicker judgement for raven1 · f61f01b1

由 changzhu 提交于 2月 14, 2020

The rlc version of raven_kicer_rlc is different from the legacy rlc
version of raven_rlc. So it needs to add a judgement function for
raven_kicer_rlc and avoid disable GFXOFF when loading raven_kicer_rlc.
Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f61f01b1

drm/amdgpu: record non-zero error counter info in NBIO before resetting GPU · 3cd4f618

由 Guchun Chen 提交于 2月 13, 2020

When NBIO's RAS error happens, before trigging GPU reset, it's needed
to record error counter information, which can correct the error counter
value missed issue when reading from debugfs.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3cd4f618

drm/amdgpu: log on non-zero error conter per IP before GPU reset · 313c8fd3

由 Guchun Chen 提交于 2月 13, 2020

Once sync flood interrupt is triggered by RAS error, before
actual GPU recovery job, it's necessary to log on and print
non-zero error counter, this will help user knows where the
RAS error source is from quickly.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

313c8fd3

13 2月, 2020 8 次提交

drm/amdgpu: return -EFAULT if copy_to_user() fails · 434cbcb1

由 Dan Carpenter 提交于 2月 12, 2020

The copy_to_user() function returns the number of bytes remaining to be
copied, but we want to return a negative error code to the user.

Fixes: 030d5b97 ("drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_vram_read")
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

434cbcb1

drm/amdgpu/gfx10: disable gfxoff when reading rlc clock · 72b4c01d

由 Alex Deucher 提交于 2月 12, 2020

Otherwise we readback all ones.  Fixes rlc counter
readback while gfxoff is active.
Reviewed-by: NXiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

72b4c01d

drm/amdgpu/gfx9: disable gfxoff when reading rlc clock · e5f13495

由 Alex Deucher 提交于 2月 12, 2020

Otherwise we readback all ones.  Fixes rlc counter
readback while gfxoff is active.
Reviewed-by: NXiaojie Yuan <xiaojie.yuan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e5f13495

drm/amdgpu/soc15: fix xclk for raven · b90c4d66

由 Alex Deucher 提交于 2月 12, 2020

It's 25 Mhz (refclk / 4).  This fixes the interpretation
of the rlc clock counter.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b90c4d66

drm/amd/display: fix dtm unloading · c786530b

由 Bhawanpreet Lakha 提交于 2月 07, 2020

there was a type in the terminate command.

We should be calling psp_dtm_unload() instead of psp_hdcp_unload()

Fixes: 143f2305 ("drm/amdgpu: psp DTM init")
Signed-off-by: NBhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c786530b

drm/amdgpu/runpm: enable runpm on baco capable VI+ asics · 4fdda2e6

由 Alex Deucher 提交于 10月 10, 2019

Seems to work reliably on VI+ except for a few so enable runpm barring
those where baco for runtime power management is not supported.

[rajneesh] Picked https://patchwork.freedesktop.org/patch/335402/ to
enable runtime pm with baco for kfd. Also fixed a checkpatch warning and
added extra checks for VEGA20 and ARCTURUS.
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4fdda2e6

drm/amdkfd: refactor runtime pm for baco · 9593f4d6

由 Rajneesh Bhardwaj 提交于 1月 21, 2020

So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

 - auto resume amdgpu driver whenever a client requests compute service
 - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.
Reviewed-by: NOak Zeng <oak.zeng@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9593f4d6

drm/amdgpu: Fix missing error check in suspend · 70bedd68

由 Rajneesh Bhardwaj 提交于 1月 27, 2020

amdgpu_device_suspend might return an error code since it can be called
from both runtime and system suspend flows. Add the missing return code
in case of a failure.
Reviewed-by: NOak Zeng <oak.zeng@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

70bedd68

12 2月, 2020 5 次提交

drm/amdgpu: add flag for runtime suspend · f0f7ddfc

由 Alex Deucher 提交于 2月 07, 2020

So we know whether we in are in runtime suspend or
system suspend.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f0f7ddfc

drm/amdgpu: Do not move root PT bo to relocated list · a6605c43

由 xinhui pan 提交于 2月 11, 2020

As root PD has no parent, we just need move its status to idle.
Suggested-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
CC: Christian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a6605c43

drm/amdgpu: correct comment to clear up the confusion · 278628fa

由 Guchun Chen 提交于 2月 11, 2020

Former comment looks to be one intended behavior in code,
actually it's not. So correct it.
Suggested-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

278628fa

drm/amdgpu/vcn2.5: fix warning · 3b4a18a3

由 James Zhu 提交于 2月 07, 2020

Fix warning during switching to dpg pause mode for
VCN firmware Version ENC: 1.1 DEC: 1 VEP: 0 Revision: 16
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Acked-by: NLeo Liu <leo.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3b4a18a3

drm/amdgpu: limit GDS clearing workaround in cold boot sequence · ea6f0931

由 Guchun Chen 提交于 2月 09, 2020

GDS clear workaround will cause gfx failure in suspend/resume case.

[   98.679559] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <gfx_v9_0> failed -110
[   98.679561] PM: dpm_run_callback(): pci_pm_resume+0x0/0xa0 returns -110
[   98.679562] PM: Device 0000:03:00.0 failed to resume async: error -110

As this workaround is specific to the HW bug of GDS's ECC error
existing in cold boot up, so bypass this workaround in suspend/
resume case after booting up.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ea6f0931

08 2月, 2020 7 次提交

drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_access_memory v2 · dd1ab799

由 Christian König 提交于 2月 05, 2020

Make use of the better performance here as well.

This patch is only compile tested!

v2: fix calculation bug pointed out by Felix
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NJonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd1ab799

drm/amdgpu: use amdgpu_device_vram_access in amdgpu_ttm_vram_read · 030d5b97

由 Christian König 提交于 1月 24, 2020

This speeds up the access quite a bit from 2.2 MB/s to
2.9 MB/s on 32bit and 12,8 MB/s on 64bit.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NJonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

030d5b97

drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2 · c12b84d6

由 Christian König 提交于 1月 31, 2020

This should speed up debugging VRAM access a lot.

v2: add HDP flush/invalidate
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NJonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c12b84d6

drm/amdgpu: optimize amdgpu_device_vram_access a bit. · ce05ac56

由 Christian König 提交于 1月 24, 2020

Only write the _HI register when necessary.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NJonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ce05ac56

drm/amdgpu: fix amdgpu pmu to use hwc->config instead of hwc->conf · 42d708db

由 Jonathan Kim 提交于 2月 06, 2020

hwc->conf was designated specifically for AMD APU IOMMU purposes.  This
could cause problems in performance and/or function since APU IOMMU
implementation is elsewhere.  Also hwc->conf and hwc->config are
different members of an anonymous union so hwc->conf aliases as
hw->last_tag.
Signed-off-by: NJonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

42d708db

drm/amdgpu/vcn2.5: fix DPG mode power off issue on instance 1 · 80ff3e10

由 James Zhu 提交于 2月 05, 2020

Support pause_state for multiple instance, and it will fix vcn2.5 DPG mode
power off issue on instance 1.
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NLeo Liu <leo.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

80ff3e10

drm/amdgpu/sriov Don't send msg when smu suspend · 86b93fd6

由 Jack Zhang 提交于 2月 05, 2020

For sriov and pp_onevf_mode, do not send message to set smu
status, because smu doesn't support these messages under VF.

Besides, it should skip smu_suspend when pp_onevf_mode is disabled.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

86b93fd6

07 2月, 2020 1 次提交

drm/amdgpu: move xgmi init/fini to xgmi_add/remove_device call (v2) · 0b9d3760

由 Hawking Zhang 提交于 12月 23, 2019

For sriov, psp ip block has to be initialized before
ih block for the dynamic register programming interface
that needed for vf ih ring buffer. On the other hand,
current psp ip block hw_init function will initialize
xgmi session which actaully depends on interrupt to
return session context. This results an empty xgmi ta
session id and later failures on all the xgmi ta cmd
invoked from vf. xgmi ta session initialization has to
be done after ih ip block hw_init call.

to unify xgmi session init/fini for both bare-metal
sriov virtualization use scenario, move xgmi ta init
to xgmi_add_device call, and accordingly terminate xgmi
ta session in xgmi_remove_device call.

The existing suspend/resume sequence will not be changed.

v2: squash in return fix from Nirmoy
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NFrank Min <Frank.Min@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0b9d3760

05 2月, 2020 1 次提交

drm/amdgpu: rework synchronization of VM updates v4 · 9f3cc18d

由 Christian König 提交于 1月 23, 2020

If provided we only sync to the BOs reservation
object and no longer to the root PD.

v2: update comment, cleanup amdgpu_bo_sync_wait_resv
v3: use correct reservation object while clearing
v4: fix typo in amdgpu_bo_sync_wait_resv
Signed-off-by: NChristian König <christian.koenig@amd.com>
Tested-by: NTom St Denis <tom.stdenis@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9f3cc18d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功