提交 · ae1589f669b5e2c557a6edc9599fc1f424003b70 · openeuler / Kernel

12 6月, 2019 12 次提交

drm/amdgpu: drop the incorrect soft_reset for SRIOV · ae1589f6

由 Monk Liu 提交于 6月 07, 2019

It's incorrect to do soft reset for SRIOV, when GFX
hang the WREG would stuck there becuase it goes KIQ way.

the GPU reset counter is incorrect: always increase twice
for each timedout
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ae1589f6

drm/amdgpu: Add GDS clearing workaround in later init for gfx9 · df0a8064

由 James Zhu 提交于 6月 07, 2019

Since Hardware bug, GDS exist ECC error after cold boot up,
adding GDS clearing workaround in later init for gfx9.
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df0a8064

drm/amd/amdgpu: remove vram_page_split kernel option (v3) · b4559a16

由 Tom St Denis 提交于 6月 04, 2019

This option is no longer needed.  The default code paths
are now the only option.

v2: Add HPAGE support and a default for non contiguous maps
v3: Misread 512 pages as MiB ...
Signed-off-by: NTom St Denis <tom.stdenis@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b4559a16

drm/amd/amdgpu: add RLC firmware to support raven1 refresh · 80f41f84

由 Prike Liang 提交于 5月 27, 2019

Use SMU firmware version to indentify the raven1 refresh device and
then load homologous RLC FW.
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Suggested-by: Huang Rui<Ray.Huang@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

80f41f84

drm/amdgpu: Hardcode reg access using L1 security · e0301317

由 Trigger Huang 提交于 6月 03, 2019

Under Vega10 SR-IOV VF, L1 register access mode should be enabled by
default as the non-security VF will no longer be supported.
Signed-off-by: NTrigger Huang <Trigger.Huang@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e0301317

drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc · e038b901

由 Shirish S 提交于 6月 04, 2019

[What]
readptr read always returns zero, since most likely
these blocks are either power or clock gated.

[How]
fetch rptr after amdgpu_ring_alloc() which informs
the power management code that the block is about to be
used and hence the gating is turned off.
Signed-off-by: NLouis Li <Ching-shih.Li@amd.com>
Signed-off-by: NShirish S <shirish.s@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e038b901

drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2) · 91c9c23e

由 Louis Li 提交于 5月 25, 2019

[What]
vce ring test fails consistently during resume in s3 cycle, due to
mismatch read & write pointers.
On debug/analysis its found that rptr to be compared is not being
correctly updated/read, which leads to this failure.
Below is the failure signature:
	[drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
	[drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <vce_v3_0> failed -110
	[drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-110).

[How]
fetch rptr appropriately, meaning move its read location further down
in the code flow.
With this patch applied the s3 failure is no more seen for >5k s3 cycles,
which otherwise is pretty consistent.

V2: remove reduntant fetch of rptr
Signed-off-by: NLouis Li <Ching-shih.Li@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

91c9c23e

drm/amdgpu: Fixed missing to clear some EDC count · 052af915

由 James Zhu 提交于 6月 04, 2019

EDC counts are related to instance and se. They are not the same
for different type of EDC. EDC clearing are changed to base on
individual EDC's instance and SE number.
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

052af915

drm/amdgpu: stop removing BOs from the LRU v3 · 55c2e5a1

由 Christian König 提交于 5月 10, 2019

This avoids OOM situations when we have lots of threads
submitting at the same time.

v3: apply this to the whole driver, not just CS
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Tested-by: NPierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

55c2e5a1

drm/amdgpu: create GDS, GWS and OA in system domain · 94de7349

由 Christian König 提交于 5月 16, 2019

And only move them in on validation. This allows for better control
when multiple processes are fighting over those resources.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Tested-by: NPierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

94de7349

drm/amdgpu: drop some validation failure messages · a3e7738d

由 Christian König 提交于 5月 17, 2019

The messages about amdgpu_cs_list_validate are duplicated because the
caller will complain into the logs as well and we can also get
interrupted by a signal here.

Also fix the the caller to not report -EAGAIN from validation.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Tested-by: NPierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a3e7738d

drm/amdgpu/psp: udpate ta_ras interface header · 5a6bfe09

由 Hawking Zhang 提交于 5月 31, 2019

ras ta interface header need to be updated to match with latest ta fw updates
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5a6bfe09

06 6月, 2019 3 次提交

Revert "drm/amdgpu: add DRIVER_SYNCOBJ_TIMELINE to amdgpu" · 72a14e9b

由 Alex Deucher 提交于 6月 05, 2019

This reverts commit 8d8a5a64.

Wait until KHR exposes the VLK support.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

72a14e9b

drm/amdgpu: fix a race in GPU reset with IB test (v2) · beff74bc

由 Alex Deucher 提交于 5月 28, 2019

Split late_init into two functions, one (do_late_init) which
just does the hw init, and late_init which calls do_late_init
and schedules the IB test work.  Call do_late_init in
the GPU reset code to run the init code, but not schedule
the IB test code.  The IB test code is called directly
in the gpu reset code so no need to run the IB tests
in a separate work thread.  If we do, we end up racing.

v2: Rework late_init.  Pull out the mgpu fan boost and xgmi
pstate code into late_init so they get called in all cases.
rename the late_init worker thread to delayed work since it's
just the IB tests now which can happen later.  Schedule the
work at init and resume time.  It's not needed at reset time
because the IB tests are called directly.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Cc: Xinhui Pan <xinhui.pan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

beff74bc

drm/amdgpu: cancel late_init_work before gpu reset · c53e4db7

由 xinhui pan 提交于 5月 17, 2019

gpu reset will run late_init and schedule the late_init_work.  if we
keep triggering gpu reset in a short time, there are potenial races.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c53e4db7

31 5月, 2019 5 次提交

drm/ttm: Make LRU removal optional v2 · 6e58ab7a

由 Christian König 提交于 5月 10, 2019

We are already doing this for DMA-buf imports and also for
amdgpu VM BOs for quite a while now.

If this doesn't run into any problems we are probably going
to stop removing BOs from the LRU altogether.

v2: drop BUG_ON from ttm_bo_add_to_lru
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Tested-by: NPierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6e58ab7a

drm/amdgpu/sriov: Correct some register program method · bdb50274

由 Emily Deng 提交于 5月 31, 2019

For the VF, some registers only could be programmed with RLC.
Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
Reviewed-by: NTrigger Huang <Trigger.Huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdb50274

drm/amdkfd: Return proper error code for gws alloc API · 443e902e

由 Oak Zeng 提交于 5月 28, 2019

Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

443e902e

drm/amdgpu:Fix the unpin warning about csb buffer · 789142eb

由 Emily Deng 提交于 5月 29, 2019

As it will destroy clear_state_obj, and also will unpin it in the
gfx_v9_0_sw_fini, so don't need to
call amdgpu_bo_free_kernel in gfx_v9_0_sw_fini, or it will have unpin warning.
Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

789142eb

drm/amdgpu: ras injection use gpu address · efb426d5

由 xinhui pan 提交于 5月 28, 2019

injection need a valid gpu address.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

efb426d5

29 5月, 2019 6 次提交

drm/amdgpu: Need to set the baco cap before baco reset · 394e9a14

由 Emily Deng 提交于 5月 28, 2019

For passthrough, after rebooted the VM, driver will do
a baco reset before doing other driver initialization during loading
 driver. For doing the baco reset, it will first
check the baco reset capability. So first need to set the
cap from the vbios information or baco reset won't be
enabled.
Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

394e9a14

drm/amdgpu/soc15: skip reset on init · d55f33da

由 Alex Deucher 提交于 5月 17, 2019

Not necessary on soc15 and breaks driver reload on server cards.
Acked-by: NAmber Lin <Amber.Lin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d55f33da

drm/amdgpu: add DRIVER_SYNCOBJ_TIMELINE to amdgpu · 8d8a5a64

由 Chunming Zhou 提交于 5月 28, 2019

Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NChunming Zhou <david1.zhou@amd.com>
Reviewed-by: NFlora Cui <Flora.Cui@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8d8a5a64

drm/amdgpu: Add function to add/remove gws to kfd process · 71efab6a

由 Oak Zeng 提交于 5月 08, 2019

GWS bo is shared between all kfd processes. Add function to add gws
to kfd process's bo list so gws can be evicted from and restored
for process.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

71efab6a

drm/amdgpu: Add interface to alloc gws from amdgpu · ca66fb8f

由 Oak Zeng 提交于 5月 06, 2019

Add amdgpu_amdkfd interface to alloc and free gws
from amdgpu
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ca66fb8f

drm/amdkfd: Add gws number to kfd topology node properties · 29e76462

由 Oak Zeng 提交于 5月 03, 2019

Add amdgpu_amdkfd interface to get num_gws and add num_gws
to /sys/class/kfd/kfd/topology/nodes/x/properties. Only report
num_gws if MEC FW support GWS barriers. Currently it is
determined by a module parameter which will be replaced
with MEC FW version check when firmware is ready.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29e76462

25 5月, 2019 14 次提交

drm/amd/doc: Add RAS documentation to guide · 74abc221

由 Tom St Denis 提交于 5月 24, 2019

Acked-by: NSlava Abramov <slava.abramov@amd.com>
Signed-off-by: NTom St Denis <tom.stdenis@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

74abc221

drm/amd/doc: Add XGMI sysfs documentation · 1c1e53f7

由 Tom St Denis 提交于 5月 24, 2019

Acked-by: NSlava Abramov <slava.abramov@amd.com>
Signed-off-by: NTom St Denis <tom.stdenis@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1c1e53f7

drm/amd/display: Switch the custom "max bpc" property to the DRM prop · 1825fd34

由 Nicholas Kazlauskas 提交于 5月 22, 2019

[Why]
The custom "max bpc" property was added to limit color depth while the
DRM one was still being merged. It's been a few kernel versions since
then and this TODO was still sticking around.

[How]
Attach the DRM max bpc property to the connector and drop all of our
custom property management. Set the max bpc to 8 by default since
DRM defaults to the max in the range which would be 16 in this case.

No behavioral changes are intended with this patch, it should just be
a refactor.

v2: Don't force 8bpc when no state is given

Cc: Leo Li <sunpeng.li@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: NNicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NHarry Wentland <harry.wentland@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1825fd34

drm/amdgpu: Add Unique Identifier sysfs file unique_id v2 · fb2dbfd2

由 Kent Russell 提交于 5月 15, 2019

Add a file that provides a Unique ID for the GPU.
This will persist across machines and is guaranteed to be unique.
This is only available for GFX9 and newer, so older ASICs will not
have this file in the sysfs pool

v2: Store it in adev for ASICs that don't have a hwmgr
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NKent Russell <kent.russell@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fb2dbfd2

drm/amdgpu: Improve error handling for HMM · 1986a3b0

由 Felix Kuehling 提交于 5月 07, 2019

Use unsigned long for number of pages.

Check that pfns are valid after hmm_vma_fault. If they are not,
return an error instead of continuing with invalid page pointers and
PTEs.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1986a3b0

drm/amdgpu: more descriptive message if HMM not enabled · b9c5eb5b

由 Philip Yang 提交于 3月 04, 2019

If using old kernel config file, CONFIG_ZONE_DEVICE is not selected,
so CONFIG_HMM and CONFIG_HMM_MIRROR is not enabled, the current driver
error message "Failed to register MMU notifier" is not clear. Inform
user with more descriptive message on how to fix the missing kernel
config option.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109808Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NMichel Dänzer <michel.daenzer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b9c5eb5b

drm/amdgpu: support userptr cross VMAs case with HMM · 6826cb3b

由 Philip Yang 提交于 3月 04, 2019

userptr may cross two VMAs if the forked child process (not call exec
after fork) malloc buffer, then free it, and then malloc larger size
buf, kerenl will create new VMA adjacent to old VMA which was cloned
from parent process, some pages of userptr are in the first VMA, the
rest pages are in the second VMA.

HMM expects range only have one VMA, loop over all VMAs in the address
range, create multiple ranges to handle this case. See
is_mergeable_anon_vma in mm/mmap.c for details.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6826cb3b

drm/amdkfd: support concurrent userptr update for HMM · 6c55d6e9

由 Philip Yang 提交于 3月 04, 2019

Userptr restore may have concurrent userptr invalidation after
hmm_vma_fault adds the range to the hmm->ranges list, needs call
hmm_vma_range_done to remove the range from hmm->ranges list first,
then reschedule the restore worker. Otherwise hmm_vma_fault will add
same range to the list, this will cause loop in the list because
range->next point to range itself.

Add function untrack_invalid_user_pages to reduce code duplication.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6c55d6e9

drm/amdgpu: fix HMM config dependency issue · ad595b86

由 Philip Yang 提交于 2月 21, 2019

Only select HMM_MIRROR will get kernel config dependency warnings
if CONFIG_HMM is missing in the config. Add depends on HMM will
solve the issue.

Add conditional compilation to fix compilation errors if HMM_MIRROR
is not enabled as HMM config is not enabled.

Remove unused function amdgpu_ttm_tt_mark_user_pages.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ad595b86

drm/amdgpu: replace get_user_pages with HMM mirror helpers · 899fbde1

由 Philip Yang 提交于 12月 13, 2018

Use HMM helper function hmm_vma_fault() to get physical pages backing
userptr and start CPU page table update track of those pages. Then use
hmm_vma_range_done() to check if those pages are updated before
amdgpu_cs_submit for gfx or before user queues are resumed for kfd.

If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
from scratch, for kfd, restore worker is rescheduled to retry.

HMM simplify the CPU page table concurrent update check, so remove
guptasklock, mmu_invalidations, last_set_pages fields from
amdgpu_ttm_tt struct.

HMM does not pin the page (increase page ref count), so remove related
operations like release_pages(), put_page(), mark_page_dirty().
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

899fbde1

drm/amdgpu: use HMM callback to replace mmu notifier · 2c5a51f5

由 Philip Yang 提交于 7月 23, 2018

Replace our MMU notifier with hmm_mirror_ops.sync_cpu_device_pagetables
callback. Enable CONFIG_HMM and CONFIG_HMM_MIRROR as a dependency in
DRM_AMDGPU_USERPTR Kconfig.

It supports both KFD userptr and gfx userptr paths.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2c5a51f5

drm/amdgpu: Use heavy weight for tlb invalidation on xgmi configuration · e14ba95b

由 shaoyunl 提交于 10月 25, 2018

There is a bug found in vml2 xgmi logic:
mtype is always sent as NC on the VMC to TC interface for a page walk,
regardless of whether the request is being sent to local or remote GPU.
NC means non-coherent and will cause the VMC return data to be cached
in the TCC (versus UC – uncached will not cache the data). Since the
page table updates are being done by SDMA/HDP, then TCC will never be
updated and the GC VML2 will continue to hit on the TCC and never get
the updated page tables and result in a fault.
Heave weigh tlb invalidation does a WB/INVAL of the L1/L2 GL data
caches so TCC will not be hit on next request
Signed-off-by: Nshaoyunl <Shaoyun.Liu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e14ba95b

drm/amdgpu: use pcie_bandwidth_available rather than open coding it · dbaa922b

由 Alex Deucher 提交于 4月 11, 2019

It does the same thing we were doing already.  I though it needed
work for gen3/4 speeds, but that seems to be covered already.
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NMichel Dänzer <michel.daenzer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dbaa922b

drm/amdgpu: use div64_ul for 32-bit compatibility v1 · d6ee400e

由 Slava Abramov 提交于 5月 16, 2019

v1: replace casting to unsigned long with div64_ul
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSlava Abramov <slava.abramov@amd.com>
Tested-by: NSlava Abramov <slava.abramov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d6ee400e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功