提交 · 640ae42efb828be69a9ee6ac88fb3d5a3e678ddf · openeuler / Kernel

24 9月, 2021 12 次提交

drm/amdgpu: Updated RAS infrastructure · 640ae42e

由 John Clements 提交于 9月 22, 2021

Update RAS infrastructure to support RAS query for MCA subblocks
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

640ae42e

drm/amdgpu: move amdgpu_virt_release_full_gpu to fini_early stage · 6effad8a

由 Guchun Chen 提交于 9月 18, 2021

adev->rmmio is set to be NULL in amdgpu_device_unmap_mmio to prevent
access after pci_remove, however, in SRIOV case, amdgpu_virt_release_full_gpu
will still use adev->rmmio for access after amdgpu_device_unmap_mmio.
The patch is to move such SRIOV calling earlier to fini_early stage.

Fixes: 07775fc1 ("drm/amdgpu: Unmap all MMIO mappings")
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NLeslie Shi <Yuliang.Shi@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6effad8a

drm/amdgpu: Fix resume failures when device is gone · ebe86a57

由 Andrey Grodzovsky 提交于 9月 17, 2021

Problem:
When device goes into suspend and unplugged during it
then all HW programming during resume fails leading
to a bad SW during pci remove handling which follows.
Because device is first resumed and only later removed
we cannot rely on drm_dev_enter/exit here.

Fix:
Use a flag we use for PCIe error recovery to avoid
accessing registres. This allows to successfully complete
pm resume sequence and finish pci remove.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ebe86a57

drm/amdgpu: Fix MMIO access page fault · c03509cb

由 Andrey Grodzovsky 提交于 9月 16, 2021

Add more guards to MMIO access post device
unbind/unplug

Bug: https://bugs.archlinux.org/task/72092?project=1&order=dateopened&sort=desc&pagenum=1Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NJames Zhu <James.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c03509cb

drm/amdgpu: Fix crash on device remove/driver unload · d82e2c24

由 Andrey Grodzovsky 提交于 9月 15, 2021

Crash:
BUG: unable to handle page fault for address: 00000000000010e1
RIP: 0010:vega10_power_gate_vce+0x26/0x50 [amdgpu]
Call Trace:
pp_set_powergating_by_smu+0x16a/0x2b0 [amdgpu]
amdgpu_dpm_set_powergating_by_smu+0x92/0xf0 [amdgpu]
amdgpu_dpm_enable_vce+0x2e/0xc0 [amdgpu]
vce_v4_0_hw_fini+0x95/0xa0 [amdgpu]
amdgpu_device_fini_hw+0x232/0x30d [amdgpu]
amdgpu_driver_unload_kms+0x5c/0x80 [amdgpu]
amdgpu_pci_remove+0x27/0x40 [amdgpu]
pci_device_remove+0x3e/0xb0
device_release_driver_internal+0x103/0x1d0
device_release_driver+0x12/0x20
pci_stop_bus_device+0x79/0xa0
pci_stop_and_remove_bus_device_locked+0x1b/0x30
remove_store+0x7b/0x90
dev_attr_store+0x17/0x30
sysfs_kf_write+0x4b/0x60
kernfs_fop_write_iter+0x151/0x1e0

Why:
VCE/UVD had dependency on SMC block for their suspend but
SMC block is the first to do HW fini due to some constraints

How:
Since the original patch was dealing with suspend issues
move the SMC block dependency back into suspend hooks as
was done in V1 of the original patches.
Keep flushing idle work both in suspend and HW fini seuqnces
since it's essential in both cases.

Fixes: 859e4659 ("drm/amdgpu: add missing cleanups for more ASICs on UVD/VCE suspend")
Fixes: bf756fb8 ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend")
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d82e2c24

drm/amdgpu: Fix uvd ib test timeout when use pre-allocated BO · 0a226780

由 xinhui pan 提交于 9月 16, 2021

Now we use same BO for create/destroy msg. So destroy will wait for the
fence returned from create to be signaled. The default timeout value in
destroy is 10ms which is too short.

Lets wait both fences with the specific timeout.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0a226780

drm/amdgpu: Put drm_dev_enter/exit outside hot codepath · b2fe31cf

由 xinhui pan 提交于 9月 15, 2021

We hit soft hang while doing memory pressure test on one numa system.
After a qucik look, this is because kfd invalid/valid userptr memory
frequently with process_info lock hold.
Looks like update page table mapping use too much cpu time.

perf top says below,
75.81%  [kernel]       [k] __srcu_read_unlock
 6.19%  [amdgpu]       [k] amdgpu_gmc_set_pte_pde
 3.56%  [kernel]       [k] __srcu_read_lock
 2.20%  [amdgpu]       [k] amdgpu_vm_cpu_update
 2.20%  [kernel]       [k] __sg_page_iter_dma_next
 2.15%  [drm]          [k] drm_dev_enter
 1.70%  [drm]          [k] drm_prime_sg_to_dma_addr_array
 1.18%  [kernel]       [k] __sg_alloc_table_from_pages
 1.09%  [drm]          [k] drm_dev_exit

So move drm_dev_enter/exit outside gmc code, instead let caller do it.
They are gart_unbind, gart_map, vm_clear_bo, vm_update_pdes and
gmc_init_pdb0. vm_bo_update_mapping already calls it.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-and-tested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b2fe31cf

drm/amdgpu: Resolve nBIF RAS error harvesting bug · 226f4f5a

由 John Clements 提交于 9月 15, 2021

Set correct RAS nBIF error query register offsets on aldebaran
Signed-off-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

226f4f5a

drm/amdgpu: Update PSP TA unload function · 17c6805a

由 Candice Li 提交于 9月 10, 2021

Update PSP TA unload function to use PSP TA context as input argument.
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

17c6805a

drm/amdgpu: Conform ASD header/loading to generic TA systems · 3f83f17b

由 Candice Li 提交于 9月 09, 2021

Update asd_context structure and add asd_initialize function to
conform ASD header/loading to generic TA systems.
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3f83f17b

drm/amdgpu: Demote TMZ unsupported log message from warning to info · 31ea4344

由 Paul Menzel 提交于 9月 13, 2021

As the user cannot do anything about the unsupported Trusted Memory Zone
(TMZ) feature, do not warn about it, but make it informational, so
demote the log level from warning to info.
Signed-off-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

31ea4344

drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count · 6cd1f9b4

由 Michel Dänzer 提交于 9月 09, 2021

This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation
units. The latter would be more involved in this case, so just drop
the inline declaration for now.

Fixes compile failure building for ppc64le on RHEL 8:

In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
                 from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call
 to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available
   90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
 1985 |         max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count();
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: c84d4670 "drm/amdgpu: validate bad page threshold in ras(v3)"
Reviewed-by: NLyude Paul <lyude@redhat.com>
Signed-off-by: NMichel Dänzer <mdaenzer@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6cd1f9b4

15 9月, 2021 15 次提交

drm/amdgpu: Fix a race of IB test · 0fcfb300

由 xinhui pan 提交于 9月 11, 2021

Direct IB submission should be exclusive. So use write lock.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0fcfb300

drm/amdgpu: VCN avoid memory allocation during IB test · 405a81ae

由 xinhui pan 提交于 9月 10, 2021

alloc extra msg from direct IB pool.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

405a81ae

drm/amdgpu: VCE avoid memory allocation during IB test · cb9038aa

由 xinhui pan 提交于 9月 10, 2021

alloc extra msg from direct IB pool.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cb9038aa

drm/amdgpu: UVD avoid memory allocation during IB test · 68331d7c

由 xinhui pan 提交于 9月 10, 2021

move BO allocation in sw_init.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

68331d7c

drm/amdgpu: Unify PSP TA context · de3a1e33

由 Candice Li 提交于 9月 08, 2021

Remove all TA binary structures and add the specific binary
structure in struct ta_context.
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

de3a1e33

drm/amdgpu: move iommu_resume before ip init/resume · 9cec53c1

由 James Zhu 提交于 9月 07, 2021

Separate iommu_resume from kfd_resume, and move it before
other amdgpu ip init/resume.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9cec53c1

drm/amdgpu: add amdgpu_amdkfd_resume_iommu · ea20e246

由 James Zhu 提交于 9月 07, 2021

Add amdgpu_amdkfd_resume_iommu for amdgpu.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ea20e246

drm/amdkfd: separate kfd_iommu_resume from kfd_resume · f8846323

由 James Zhu 提交于 9月 07, 2021

Separate kfd_iommu_resume from kfd_resume for fine-tuning
of amdgpu device init/resume/reset/recovery sequence.

v2: squash in fix for !CONFIG_HSA_AMD

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f8846323

drm/amdgpu: Get atomicOps info from Host for sriov setup · 8e6d0b69

由 shaoyunl 提交于 9月 08, 2021

The AtomicOp Requester Enable bit is reserved in VFs and the PF value applies to all
associated VFs. so guest driver can not directly enable the atomicOps for VF, it
depends on PF to enable it. In current design, amdgpu driver will get the enabled
atomicOps bits through private pf2vf data
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8e6d0b69

drm/amdgpu: Increase direct IB pool size · a7496559

由 xinhui pan 提交于 9月 09, 2021

Direct IB pool is used for vce/vcn IB extra msg too. Increase its size
to AMDGPU_IB_POOL_SIZE.

v2: Squash in unused variable removal
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a7496559

drm/amdgpu: Update RAS trigger error block support · 3771449b

由 John Clements 提交于 9月 09, 2021

Added trigger error support for MP0/MP1/MPIO blocks
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3771449b

drm/amdgpu: Update RAS status print · 334f81d1

由 John Clements 提交于 9月 09, 2021

Remove uncessary RAS status prints
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

334f81d1

drm/amdgpu: refactor function to init no-psp fw · 02f958a2

由 Likun Gao 提交于 9月 09, 2021

Refactor the code of amdgpu_ucode_init_single_fw to make it more
readable as too many ucode need to handle on this function currently.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

02f958a2

drm/amdgpu: cleanup debugfs for amdgpu rings · 62d266b2

由 Nirmoy Das 提交于 9月 02, 2021

Use debugfs_create_file_size API for creating ring debugfs, and as its a
NULL returning API, change the return type for amdgpu_debugfs_ring_init
API as well. Also cleanup surrounding code.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NShashank Sharma <shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

62d266b2

drm/amdgpu: use IS_ERR for debugfs APIs · 59715cff

由 Nirmoy Das 提交于 9月 02, 2021

debugfs APIs returns encoded error so use
IS_ERR for checking return value.

v2: return PTR_ERR(ent)

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-By: NShashank Sharma <shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

59715cff

08 9月, 2021 5 次提交

drm/amdgpu: sdma: clean up identation · e8ba4922

由 Colin Ian King 提交于 9月 02, 2021

There is a statement that is indented incorrectly. Clean it up.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e8ba4922

drm/amdgpu: clean up inconsistent indenting · 9ae807f0

由 Colin Ian King 提交于 9月 02, 2021

There are a couple of statements that are indented one character
too deeply, clean these up.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9ae807f0

drm/amdgpu: remove unused amdgpu_bo_validate · a7181b52

由 Christian König 提交于 9月 07, 2021

Just drop some dead code.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a7181b52

drm/amdgpu: fix use after free during BO move · 101ba90f

由 Christian König 提交于 9月 07, 2021

The memory backing old_mem is already freed at that point, move the
check a bit more up.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Fixes: bfa3357e ("drm/ttm: allocate resource object instead of embedding it v2")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1699Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NMichel Dänzer <mdaenzer@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

101ba90f

drm/amdgpu: Create common PSP TA load function · ac1509d1

由 Candice Li 提交于 9月 04, 2021

Creat common PSP TA load function and update PSP ta_mem_context
with size information.
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ac1509d1

03 9月, 2021 1 次提交

drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10 · cd54323e

由 Ernst Sjöstrand 提交于 9月 02, 2021

Seems like newer cards can have even more instances now.
Found by UBSAN: array-index-out-of-bounds in
drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29
index 8 is out of range for type 'uint32_t *[8]'

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697
Cc: stable@vger.kernel.org
Signed-off-by: NErnst Sjöstrand <ernstp@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cd54323e

02 9月, 2021 7 次提交

drm/amdgpu:schedule vce/vcn encode based on priority · 7d7630fc

由 Satyajit Sahu 提交于 8月 26, 2021

Schedule the encode job in VCE/VCN encode ring
based on the priority set by UMD.
Signed-off-by: NSatyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7d7630fc

drm/amdgpu/vcn: set the priority for each encode ring · 0ad29a4e

由 Satyajit Sahu 提交于 8月 27, 2021

VCN has multiple rings. Set the proper priority level for each
encode ring while initializing.
Signed-off-by: NSatyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0ad29a4e

drm/amdgpu/vce: set the priority for each ring · 080e613c

由 Satyajit Sahu 提交于 8月 27, 2021

VCE has multiple rings. Set the proper priority level for each
ring while initializing.
Signed-off-by: NSatyajit Sahu <satyajit.sahu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

080e613c

drm/amd/amdgpu: add mpio to ras block · a0a2f7bb

由 Candice Li 提交于 8月 27, 2021

Add MPIO to RAS block
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a0a2f7bb

drm/amd/amdgpu: consolidate PSP TA unload function · 25c94b33

由 Candice Li 提交于 8月 27, 2021

Create common PSP TA unload function and replace all common TA unloading
sequences.
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

25c94b33

drm/amd/amdgpu: New debugfs interface for MMIO registers (v5) · 37df9560

由 Tom St Denis 提交于 8月 20, 2021

This new debugfs interface uses an IOCTL interface in order to pass
along state information like SRBM and GRBM bank switching.  This
new interface also allows a full 32-bit MMIO address range which
the previous didn't.  With this new design we have room to grow
the flexibility of the file as need be.

(v2): Move read/write to .read/.write, fix style, add comment
      for IOCTL data structure

(v3): C style comments

(v4): use u32 in struct and remove offset variable

(v5): Drop flag clearing in op function, use 0xFFFFFFFF for broadcast
      instead of 0x3FF, use mutex for op/ioctl.
Signed-off-by: NTom St Denis <tom.stdenis@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

37df9560

drm/amdgpu: detach ring priority from gfx priority · 34eaf30f

由 Nirmoy Das 提交于 8月 25, 2021

Currently AMDGPU_RING_PRIO_MAX is redefinition of a
max gfx hwip priority, this won't work well when we will
have a hwip with different set of priorities than gfx.
Also, HW ring priorities are different from ring priorities.

Create a global enum for ring priority levels which each
HWIP can use to define its own priority levels.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

34eaf30f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功