提交 · e77a541f5dea0a2ff9d6a40dcda9b284e1e736fe · openeuler / Kernel

24 6月, 2022 1 次提交

drm/amdkfd: Enable GFX11 usermode queue oversubscription · e77a541f

由 Graham Sider 提交于 5月 11, 2022

Starting with GFX11, MES requires wptr BOs to be GTT allocated/mapped to
GART for usermode queues in order to support oversubscription. In the
case that work is submitted to an unmapped queue, MES must have a GART
wptr address to determine whether the queue should be mapped.

This change is accompanied with changes in MES and is applicable for
MES_API_VERSION >= 2.

v3:
- Use amdgpu_vm_bo_lookup_mapping for wptr_bo mapping lookup
- Move wptr_bo refcount increment to amdgpu_amdkfd_map_gtt_bo_to_gart
- Remove list_del_init from amdgpu_amdkfd_map_gtt_bo_to_gart
- Cleanup/fix create_queue wptr_bo error handling
v4:
- Add MES version shift/mask defines to amdgpu_mes.h
- Change version check from MES_VERSION to MES_API_VERSION
- Add check in kfd_ioctl_create_queue before wptr bo pin/GART map to
ensure bo is a single page.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e77a541f

23 6月, 2022 2 次提交

drm/amdkfd: Free queue after unmap queue success · ab8529b0

由 Philip Yang 提交于 6月 10, 2022

After queue unmap or remove from MES successfully, free queue sysfs
entries, doorbell and remove from queue list. Otherwise, application may
destroy queue again, cause below kernel warning or crash backtrace.

For outstanding queues, either application forget to destroy or failed
to destroy, kfd_process_notifier_release will remove queue sysfs
entries, kfd_process_wq_release will free queue doorbell.

v2: decrement_queue_count for MES queue

 refcount_t: underflow; use-after-free.
 WARNING: CPU: 7 PID: 3053 at lib/refcount.c:28
  Call Trace:
   kobject_put+0xd6/0x1a0
   kfd_procfs_del_queue+0x27/0x30 [amdgpu]
   pqm_destroy_queue+0xeb/0x240 [amdgpu]
   kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
   kfd_ioctl+0x27d/0x500 [amdgpu]
   do_syscall_64+0x35/0x80

 WARNING: CPU: 2 PID: 3053 at drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device_queue_manager.c:400
  Call Trace:
   deallocate_doorbell.isra.0+0x39/0x40 [amdgpu]
   destroy_queue_cpsch+0xb3/0x270 [amdgpu]
   pqm_destroy_queue+0x108/0x240 [amdgpu]
   kfd_ioctl_destroy_queue+0x32/0x70 [amdgpu]
   kfd_ioctl+0x27d/0x500 [amdgpu]

 general protection fault, probably for non-canonical address
0xdead000000000108:
 Call Trace:
  pqm_destroy_queue+0xf0/0x200 [amdgpu]
  kfd_ioctl_destroy_queue+0x2f/0x60 [amdgpu]
  kfd_ioctl+0x19b/0x600 [amdgpu]
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NGraham Sider <Graham.Sider@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ab8529b0

drm/amdkfd: Add queue to MES if it becomes active · f4f9b827

由 Philip Yang 提交于 6月 15, 2022

We remove the user queue from MES scheduler to update queue properties.
If the queue becomes active after updating, add the user queue to MES
scheduler, to be able to handle command packet submission.

v2: don't break pqm_set_gws
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NGraham Sider <Graham.Sider@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f4f9b827

16 5月, 2022 1 次提交

drm/amdkfd: Fix static checker warning on MES queue type · 04fd0739

由 Graham Sider 提交于 5月 12, 2022

convert_to_mes_queue_type return can be negative, but
queue_input.queue_type is uint32_t. Put return in integer var and cast
to unsigned after negative check.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

04fd0739

04 5月, 2022 1 次提交

drm/amdkfd: Add KFD support for soc21 v3 · cc009e61

由 Mukul Joshi 提交于 4月 26, 2022

Add initial support for soc21 in KFD compute
driver (Mukul)
- Add new definition for soc21 device.
- Add new file for amdgpu-kfd interface for GFX11 family.
- Add new file for queue management, interrupt handling,
  mqd management for GFX11 family in KFD driver.
- Related changes/updates for soc21 device in
  KFD driver.
- Repurpose last 2 entries of SDMA MQD for driver use.

v2: Add an optional argument into update queue operation (Mukul)

v3: Switch to ip version check, replace kgd_dev with
    amdgpu_device (Hawking)
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NOak Zeng <Oak.Zeng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cc009e61

28 4月, 2022 1 次提交

drm/amdkfd: Fix GWS queue count · 7c6b6e18

由 David Yat Sin 提交于 4月 18, 2022

dqm->gws_queue_count and pdd->qpd.mapped_gws_queue need to be updated
each time the queue gets evicted.

Fixes: b8020b03 ("drm/amdkfd: Enable over-subscription with >1 GWS queue")
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7c6b6e18

21 4月, 2022 1 次提交

drm/amdkfd: Fix GWS queue count · ab4d51d4

由 David Yat Sin 提交于 4月 18, 2022

dqm->gws_queue_count and pdd->qpd.mapped_gws_queue need to be updated
each time the queue gets evicted.

Fixes: b8020b03 ("drm/amdkfd: Enable over-subscription with >1 GWS queue")
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ab4d51d4

10 3月, 2022 1 次提交

drm/amdkfd: bail out early if no get_atc_vmid_pasid_mapping_info · d55957fb

由 Yifan Zhang 提交于 3月 09, 2022

it makes no sense to continue with an undefined vmid.

Fixes: c8b0507f ("drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call")
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reported-by: NNathan Chancellor <nathan@kernel.org>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d55957fb

05 3月, 2022 1 次提交

drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call · c8b0507f

由 Yifan Zhang 提交于 3月 03, 2022

Fix the NULL point issue:

[ 3076.255609] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 3076.255624] #PF: supervisor instruction fetch in kernel mode
[ 3076.255637] #PF: error_code(0x0010) - not-present page
[ 3076.255649] PGD 0 P4D 0
[ 3076.255660] Oops: 0010 [#1] SMP NOPTI
[ 3076.255669] CPU: 20 PID: 2415 Comm: kfdtest Tainted: G        W  OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[ 3076.255691] Hardware name: AMD Splinter/Splinter-RPL, BIOS VS2326337N.FD 02/07/2022
[ 3076.255706] RIP: 0010:0x0
[ 3076.255718] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 3076.255732] RSP: 0018:ffffb64283e3fc10 EFLAGS: 00010297
[ 3076.255744] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000027
[ 3076.255759] RDX: ffffb64283e3fc1e RSI: 0000000000000008 RDI: ffff8c7a87f60000
[ 3076.255776] RBP: ffffb64283e3fc78 R08: ffff8c7d88518ac0 R09: ffffb64283e3fa60
[ 3076.255791] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000000f
[ 3076.255805] R13: ffff8c7bdcea5800 R14: ffff8c7a9f3f3000 R15: ffff8c7a8696bc00
[ 3076.255820] FS:  0000000000000000(0000) GS:ffff8c7d88500000(0000) knlGS:0000000000000000
[ 3076.255839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3076.255851] CR2: ffffffffffffffd6 CR3: 0000000109e3c000 CR4: 0000000000750ee0
[ 3076.255866] PKRU: 55555554
[ 3076.255873] Call Trace:
[ 3076.255884]  dbgdev_wave_reset_wavefronts+0x72/0x160 [amdgpu]
[ 3076.256025]  process_termination_cpsch.cold+0x26/0x2f [amdgpu]
[ 3076.256182]  ? ktime_get_mono_fast_ns+0x4e/0xa0
[ 3076.256196]  kfd_process_dequeue_from_all_devices+0x49/0x70 [amdgpu]
[ 3076.256328]  kfd_process_notifier_release+0x187/0x2b0 [amdgpu]
[ 3076.256451]  ? mn_itree_inv_end+0xdc/0x110
[ 3076.256463]  __mmu_notifier_release+0x74/0x1f0
[ 3076.256474]  exit_mmap+0x170/0x200
[ 3076.256484]  ? __handle_mm_fault+0x677/0x920
[ 3076.256496]  ? _cond_resched+0x19/0x30
[ 3076.256507]  mmput+0x5d/0x130
[ 3076.256518]  do_exit+0x332/0xaf0
[ 3076.256526]  ? handle_mm_fault+0xd7/0x2b0
[ 3076.256537]  do_group_exit+0x43/0xa0
[ 3076.256548]  __x64_sys_exit_group+0x18/0x20
[ 3076.256559]  do_syscall_64+0x38/0x90
[ 3076.256569]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c8b0507f

15 2月, 2022 3 次提交

drm/amdkfd: remove unneeded unmap single queue option · d2cb0b21

由 Jonathan Kim 提交于 2月 10, 2022

The KFD only unmaps all queues, all dynamics queues or all process queues
since RUN_LIST is mapped with all KFD queues.

There's no need to provide a single type unmap so remove this option.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d2cb0b21

drm/amdkfd: Fix leftover errors and warnings · 2243f493

由 Rajneesh Bhardwaj 提交于 2月 10, 2022

A bunch of errors and warnings are leftover KFD over the years, attempt
to fix the errors and most warnings reported by checkpatch tool. Still a
few warnings remain which may be false positives so ignore them for now.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2243f493

drm/amdkfd: update SPDX license header · d87f36a0

由 Rajneesh Bhardwaj 提交于 2月 10, 2022

Update the SPDX License header for all the KFD files.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d87f36a0

10 2月, 2022 2 次提交

drm/amdkfd: Remove unused old debugger implementation · 5bdd3eb2

由 Mukul Joshi 提交于 2月 04, 2022

Cleanup the kfd code by removing the unused old debugger
implementation.
The address watch was only ever implemented in the upstream
driver for GFXv7 (Kaveri). The user mode tools runtime using
this API was never open-sourced. Work on the old debugger
prototype that used this API has been discontinued years ago.
Only a small piece of resetting wavefronts is kept and
is moved to kfd_device_queue_manager.c.
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5bdd3eb2

drm/amdkfd: rename kfd_process_vm_fault to kfd_dqm_evict_pasid · 03e5b167

由 Tao Zhou 提交于 2月 07, 2022

As the function is used in more different cases, use a more general
name.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

03e5b167

08 2月, 2022 4 次提交

drm/amdkfd: CRIU checkpoint and restore queue control stack · 3a9822d7

由 David Yat Sin 提交于 1月 25, 2021

Checkpoint contents of queue control stacks on CRIU dump and restore them
during CRIU restore.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3a9822d7

drm/amdkfd: CRIU checkpoint and restore queue mqds · 42c6c482

由 David Yat Sin 提交于 1月 25, 2021

Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

42c6c482

drm/amdkfd: CRIU restore queue doorbell id · 5bb6a8fa

由 David Yat Sin 提交于 1月 25, 2021

When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5bb6a8fa

drm/amdkfd: CRIU restore sdma id for queues · 2485c12c

由 David Yat Sin 提交于 1月 25, 2021

When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2485c12c

12 1月, 2022 1 次提交

drm/amdkfd: Fix DQM asserts on Hawaii · 6f4cb84a

由 Felix Kuehling 提交于 12月 07, 2021

start_nocpsch would never set dqm->sched_running on Hawaii due to an
early return statement. This would trigger asserts in other functions
and end up in inconsistent states.

Bug: https://github.com/RadeonOpenCompute/ROCm/issues/1624Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NKent Russell <kent.russell@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6f4cb84a

29 12月, 2021 2 次提交

drm/amdkfd: add reset queue function for RAS poison (v2) · dec63443

由 Tao Zhou 提交于 12月 16, 2021

The new interface unmaps queues with reset mode for the process consumes
RAS poison, it's only for compute queue.

v2: rename the function to reset_queues.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dec63443

drm/amdkfd: add reset parameter for unmap queues · f6b80c04

由 Tao Zhou 提交于 12月 09, 2021

So we can set reset mode for unmap operation, no functional change.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f6b80c04

02 12月, 2021 1 次提交

drm/amdkfd: add kfd_device_info_init function · f0dc99a6

由 Graham Sider 提交于 11月 17, 2021

Initializes kfd->device_info given either asic_type (enum) if GFX
version is less than GFX9, or GC IP version if greater. Also takes in vf
and the target compiler gfx version. Uses SDMA version to determine
num_sdma_queues_per_engine.

Convert device_info to a non-pointer member of kfd, change references
accordingly.

Change unsupported asic condition to only probe f2g, move device_info
initialization post-switch.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f0dc99a6

23 11月, 2021 2 次提交

drm/amdkfd: Retrieve SDMA numbers from amdgpu · ee2f17f4

由 Amber Lin 提交于 11月 18, 2021

Instead of hard coding the number of sdma engines and the number of
sdma_xgmi engines in the device_info table, get the number of toal SDMA
instances from amdgpu. The first two engines are sdma engines and the
rest are sdma-xgmi engines unless the ASIC doesn't support XGMI.

v2: add kfd_ prefix to non static function names
Signed-off-by: NAmber Lin <Amber.Lin@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ee2f17f4

drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again · c96cb659

由 shaoyunl 提交于 11月 14, 2021

In SRIOV configuration, the reset may failed to bring asic back to normal but stop cpsch
already been called, the start_cpsch will not be called since there is no resume in this
case. When reset been triggered again, driver should avoid to do uninitialization again.
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c96cb659

18 11月, 2021 9 次提交

drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again · 2cf49e00

由 shaoyunl 提交于 11月 14, 2021

2cf49e00

drm/amdkfd: replace asic_family with asic_type · 7eb0502a

由 Graham Sider 提交于 11月 10, 2021

asic_family was a duplicate of asic_type, both of type amd_asic_type.
Replace all instances of device_info->asic_family with adev->asic_type
and remove asic_family from device_info.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7eb0502a

drm/amdkfd: convert misc checks to IP version checking · 046e674b

由 Graham Sider 提交于 11月 09, 2021

Switch to IP version checking instead of asic_type on various KFD
version checks.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

046e674b

drm/amdkfd: convert switches to IP version checking · e4804a39

由 Graham Sider 提交于 10月 28, 2021

Converts KFD switch statements to use IP version checking instead
of asic_type.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e4804a39

drm/amdkfd: convert KFD_IS_SOC to IP version checking · dd0ae064

由 Graham Sider 提交于 11月 09, 2021

Defined as GC HWIP >= IP_VERSION(9, 0, 1).

Also defines KFD_GC_VERSION to return GC HWIP version.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd0ae064

drm/amdkfd: replace/remove remaining kgd_dev references · 56c5977e

由 Graham Sider 提交于 10月 19, 2021

Remove get_amdgpu_device and other remaining kgd_dev references aside
from declaration/kfd struct entry and initialization.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

56c5977e

drm/amdkfd: replace kgd_dev in various amgpu_amdkfd funcs · 6bfc7c7e

由 Graham Sider 提交于 10月 19, 2021

Modified definitions:

- amdgpu_amdkfd_submit_ib
- amdgpu_amdkfd_set_compute_idle
- amdgpu_amdkfd_have_atomics_support
- amdgpu_amdkfd_flush_gpu_tlb_pasid
- amdgpu_amdkfd_flush_gpu_tlb_pasid
- amdgpu_amdkfd_gpu_reset
- amdgpu_amdkfd_alloc_gtt_mem
- amdgpu_amdkfd_free_gtt_mem
- amdgpu_amdkfd_alloc_gws
- amdgpu_amdkfd_free_gws
- amdgpu_amdkfd_ras_poison_consumption_handler
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6bfc7c7e

drm/amdkfd: replace kgd_dev in various kfd2kgd funcs · 3356c38d

由 Graham Sider 提交于 10月 14, 2021

Modified definitions:

- program_sh_mem_settings
- set_pasid_vmid_mapping
- init_interrupts
- address_watch_disable
- address_watch_execute
- wave_control_execute
- address_watch_get_offset
- get_atc_vmid_pasid_mapping_info
- set_scratch_backing_va
- set_vm_context_page_table_base
- read_vmid_from_vmfault_reg
- get_cu_occupancy
- program_trap_handler_settings
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3356c38d

drm/amdkfd: replace kgd_dev in hqd/mqd kfd2kgd funcs · 420185fd

由 Graham Sider 提交于 10月 15, 2021

Modified definitions:

- hqd_load
- hiq_mqd_load
- hqd_sdma_load
- hqd_dump
- hqd_sdma_dump
- hqd_is_occupied
- hqd_destroy
- hqd_sdma_is_occupied
- hqd_sdma_destroy
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

420185fd

06 11月, 2021 1 次提交

drm/amd/amdkfd: Don't sent command to HWS on kfd reset · b8c20c74

由 shaoyunl 提交于 11月 03, 2021

When kfd need to be reset, sent command to HWS might cause hang and get unnecessary timeout.
This change try not to touch HW in pre_reset and keep queues to be in the evicted state
when the reset is done, so they are not put back on the runlist. These queues will be destroied
on process termination.
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b8c20c74

29 10月, 2021 1 次提交

drm/amdkfd: Add an optional argument into update queue operation(v2) · c6e559eb

由 Lang Yu 提交于 10月 08, 2021

Currently, queue is updated with data in queue_properties.
And all allocated resource in queue_properties will not
be freed until the queue is destroyed.

But some properties(e.g., cu mask) bring some memory
management headaches(e.g., memory leak) and make code
complex. Actually they have been copied to mqd and
don't have to persist in queue_properties.

Add an argument into update queue to pass such properties,
then we can remove them from queue_properties.

v2: Don't use void *.
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NLang Yu <lang.yu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c6e559eb

12 8月, 2021 1 次提交

drm/amdkfd: CWSR with software scheduler · b53ef0df

由 Mukul Joshi 提交于 8月 09, 2021

This patch adds support to program trap handler settings
when loading driver with software scheduler (sched_policy=2).
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Suggested-by: NJay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b53ef0df

23 7月, 2021 3 次提交

drm/amdkfd: enable cyan_skillfish KFD · 06e75b88

由 Tao Zhou 提交于 7月 13, 2021

Add KFD support for cyan_skillfish.

v2: whitespace fixes (Alex)
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

06e75b88

drm/amdkfd: Fix a concurrency issue during kfd recovery · 4f942aae

由 Oak Zeng 提交于 7月 15, 2021

start_cpsch and stop_cpsch can be called during kfd device
initialization or during gpu reset/recovery. So they can
run concurrently. Currently in start_cpsch and stop_cpsch,
pm_init and pm_uninit is not protected by the dpm lock.
Imagine such a case that user use packet manager's function
to submit a pm4 packet to hang hws (ie through command
cat /sys/class/kfd/kfd/topology/nodes/1/gpu_id | sudo tee
/sys/kernel/debug/kfd/hang_hws), while kfd device is under
device reset/recovery so packet manager can be not initialized.
There will be unpredictable protection fault in such case.

This patch moves pm_init/uninit inside the dpm lock and check
packet manager is initialized before using packet manager
function.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Acked-by: NChristian Konig <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4f942aae

drm/amdkfd: Renaming dqm->packets to dqm->packet_mgr · 9af5379c

由 Oak Zeng 提交于 7月 15, 2021

Renaming packets to packet_mgr to reflect the real meaning
of this variable.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Acked-by: NChristian Konig <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9af5379c

19 6月, 2021 1 次提交

drm/amdkfd: Walk through list with dqm lock hold · 56f221b6

由 xinhui pan 提交于 6月 15, 2021

To avoid any list corruption.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

56f221b6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功