提交 · 42c6c48214b726c30918e8dc80e2168607d13ae4 · openeuler / Kernel

08 2月, 2022 9 次提交

drm/amdkfd: CRIU checkpoint and restore queue mqds · 42c6c482

由 David Yat Sin 提交于 1月 25, 2021

Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

42c6c482

drm/amdkfd: CRIU restore queue ids · 8668dfc3

由 David Yat Sin 提交于 1月 25, 2021

When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8668dfc3

drm/amdkfd: CRIU add queues support · 626f7b31

由 David Yat Sin 提交于 1月 25, 2021

Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

626f7b31

drm/amdkfd: CRIU Implement KFD unpause operation · cd9f7910

由 David Yat Sin 提交于 8月 16, 2021

Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO
op the queues will be stay in an evicted state. Once the plugin is done
draining BO contents, it is safe to perform an UNPAUSE op for the queues
to resume.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cd9f7910

drm/amdkfd: CRIU Implement KFD resume ioctl · 011bbb03

由 Rajneesh Bhardwaj 提交于 1月 11, 2021

This adds support to create userptr BOs on restore and introduces a new
ioctl op to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the
restore process i.e. criu_resume ioctl op is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

011bbb03

drm/amdkfd: CRIU Implement KFD restore ioctl · 73fa13b6

由 Rajneesh Bhardwaj 提交于 12月 01, 2020

This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to the checkpointed image.
This ioctl creates various types of buffer objects such as VRAM,
MMIO, Doorbell, GTT based on the date sent from the userspace plugin.
The data mostly contains the previously checkpointed KFD images from
some KFD processs.

While restoring a criu process, attach old IDR values to newly
created BOs. This also adds the minimal gpu mapping support for a single
gpu checkpoint restore use case.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

73fa13b6

drm/amdkfd: CRIU Implement KFD checkpoint ioctl · 5ccbb057

由 Rajneesh Bhardwaj 提交于 11月 30, 2020

This adds support to discover the  buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
during a restore operation.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5ccbb057

drm/amdkfd: CRIU Implement KFD process_info ioctl · f185381b

由 Rajneesh Bhardwaj 提交于 8月 24, 2021

This IOCTL op is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
IOCTL op.

The process_info IOCTL op determines the number of GPUs, buffer objects
that are associated with the target process, its process id in
caller's namespace since /proc/pid/mem interface maybe used to drain
the contents of the discovered buffer objects in userspace and getpid
returns the pid of CRIU dumper process. Also the pid of a process
inside a container might be different than its global pid so return
the ns pid.
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f185381b

drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs · 36988070

由 Rajneesh Bhardwaj 提交于 8月 24, 2021

Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
and its extensible plugin interface. Thus, In order to support the
Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
Kernel driver, needs to provide a set of new APIs that provide
necessary VRAM metadata and its contents to a userspace component
(CRIU plugin) that can store it in form of image files.

This introduces some new ioctls which will be used to checkpoint-Restore
any KFD bound user process. KFD only allows ioctl calls from the same
process that opened the KFD file descriptor. Since these ioctls are
expected to be called from a KFD criu plugin which has elevated ptrace
attached privileges and CAP_CHECKPOINT_RESTORE capabilities attached with
the file descriptors so modify KFD to allow such calls.

(API redesigned by David Yat Sin)
Suggested-by: NFelix Kuehling <felix.kuehling@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

36988070

28 1月, 2022 2 次提交

drm/amdkfd: Don't take process mutex for svm ioctls · ac7c48c0

由 Philip Yang 提交于 1月 24, 2022

SVM ioctls take proper svms->lock to handle race conditions, don't need
take process mutex to serialize ioctls. This also fixes circular locking
warning:

WARNING: possible circular locking dependency detected

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock((work_completion)(&svms->deferred_list_work));
                                lock(&process->mutex);
                     lock((work_completion)(&svms->deferred_list_work));
   lock(&process->mutex);

   *** DEADLOCK ***
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ac7c48c0

drm/amdkfd: enable heavy-weight TLB flush on Vega20 · 1790b649

由 Eric Huang 提交于 1月 21, 2022

It is to meet the requirement for memory allocation
optimization on MI50.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1790b649

20 1月, 2022 1 次提交

drm/amdkfd: enable heavy-weight TLB flush on Arcturus · f61c40c0

由 Eric Huang 提交于 1月 18, 2022

SDMA FW fixes the hang issue for adding heavy-weight TLB
flush on Arcturus, so we can enable it.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f61c40c0

18 11月, 2021 7 次提交

drm/amdkfd: replace asic_family with asic_type · 7eb0502a

由 Graham Sider 提交于 11月 10, 2021

asic_family was a duplicate of asic_type, both of type amd_asic_type.
Replace all instances of device_info->asic_family with adev->asic_type
and remove asic_family from device_info.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7eb0502a

drm/amdkfd: convert misc checks to IP version checking · 046e674b

由 Graham Sider 提交于 11月 09, 2021

Switch to IP version checking instead of asic_type on various KFD
version checks.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

046e674b

drm/amdkfd: convert KFD_IS_SOC to IP version checking · dd0ae064

由 Graham Sider 提交于 11月 09, 2021

Defined as GC HWIP >= IP_VERSION(9, 0, 1).

Also defines KFD_GC_VERSION to return GC HWIP version.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd0ae064

drm/amdkfd: replace trivial funcs with direct access · 02274fc0

由 Graham Sider 提交于 11月 05, 2021

These get funcs simply return an adev field. Replace funcs/calls with
direct field accesses instead.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

02274fc0

drm/amdkfd: replace kgd_dev in gpuvm amdgpu_amdkfd funcs · dff63da9

由 Graham Sider 提交于 10月 19, 2021

Modified definitions:

- amdgpu_amdkfd_gpuvm_acquire_process_vm
- amdgpu_amdkfd_gpuvm_release_process_vm
- amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu
- amdgpu_amdkfd_gpuvm_free_memory_of_gpu
- amdgpu_amdkfd_gpuvm_map_memory_to_gpu
- amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu
- amdgpu_amdkfd_gpuvm_sync_memory
- amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel
- amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel
- amdgpu_amdkfd_gpuvm_get_vm_fault_info
- amdgpu_amdkfd_gpuvm_import_dmabuf
- amdgpu_amdkfd_get_tile_config

Removed:

- get_amdgpu_device
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dff63da9

drm/amdkfd: replace kgd_dev in get amdgpu_amdkfd funcs · 574c4183

由 Graham Sider 提交于 10月 19, 2021

Modified definitions:

- amdgpu_amdkfd_get_fw_version
- amdgpu_amdkfd_get_local_mem_info
- amdgpu_amdkfd_get_gpu_clock_counter
- amdgpu_amdkfd_get_max_engine_clock_in_mhz
- amdgpu_amdkfd_get_cu_info
- amdgpu_amdkfd_get_dmabuf_info
- amdgpu_amdkfd_get_vram_usage
- amdgpu_amdkfd_get_hive_id
- amdgpu_amdkfd_get_unique_id
- amdgpu_amdkfd_get_mmio_remap_phys_addr
- amdgpu_amdkfd_get_num_gws
- amdgpu_amdkfd_get_asic_rev_id
- amdgpu_amdkfd_get_noretry
- amdgpu_amdkfd_get_xgmi_hops_count
- amdgpu_amdkfd_get_xgmi_bandwidth_mbytes
- amdgpu_amdkfd_get_pcie_bandwidth_mbytes

Also replaces kfd_device_by_kgd with kfd_device_by_adev, now
searching via adev rather than kgd.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

574c4183

drm/amdkfd: replace kgd_dev in various kfd2kgd funcs · 3356c38d

由 Graham Sider 提交于 10月 14, 2021

Modified definitions:

- program_sh_mem_settings
- set_pasid_vmid_mapping
- init_interrupts
- address_watch_disable
- address_watch_execute
- wave_control_execute
- address_watch_get_offset
- get_atc_vmid_pasid_mapping_info
- set_scratch_backing_va
- set_vm_context_page_table_base
- read_vmid_from_vmfault_reg
- get_cu_occupancy
- program_trap_handler_settings
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3356c38d

29 10月, 2021 2 次提交

drm/amdkfd: Remove cu mask from struct queue_properties(v2) · 7c695a2c

由 Lang Yu 提交于 10月 08, 2021

Actually, cu_mask has been copied to mqd memory and
does't have to persist in queue_properties. Remove it
from queue_properties.

And use struct mqd_update_info to store such properties,
then pass it to update queue operation.

v2:
* Rename pqm_update_queue to pqm_update_queue_properties.
* Rename struct queue_update_info to struct mqd_update_info.
* Rename pqm_set_cu_mask to pqm_update_mqd.
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NLang Yu <lang.yu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7c695a2c

drm/amdkfd: Separate pinned BOs destruction from general routine · 68df0f19

由 Lang Yu 提交于 10月 11, 2021

Currently, all kfd BOs use same destruction routine. But pinned
BOs are not unpinned properly. Separate them from general routine.

v2 (Felix):
Add safeguard to prevent user space from freeing signal BO.
Kunmap signal BO in the event of setting event page error.
Just kunmap signal BO to avoid duplicating the code.
Signed-off-by: NLang Yu <lang.yu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

68df0f19

14 10月, 2021 3 次提交

amd/amdkfd: remove svms declaration to avoid werror · 7e3fb209

由 Alex Sierra 提交于 9月 30, 2021

svm_range_list svms declaration removed to avoid werror when
CONFIG_HSA_AMD_SVM is not enabled.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7e3fb209

drm/amdkfd: fix KFDSVMRangeTest.PartialUnmapSysMemTest fails · 9c152f54

由 Yifan Zhang 提交于 8月 14, 2021

[ RUN      ] KFDSVMRangeTest.PartialUnmapSysMemTest
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:245: Failure
Value of: (hsaKmtAllocMemory(m_Node, m_Size, m_Flags, &m_pBuf))
  Actual: 1
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:248: Failure
Value of: (hsaKmtMapMemoryToGPUNodes(m_pBuf, m_Size, __null, mapFlags, 1, &m_Node))
  Actual: 1
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDTestUtil.cpp:306: Failure
Expected: ((void *)__null) != (ptr), actual: NULL vs NULL
Segmentation fault (core dumped)
[          ] Profile: Full Test
[          ] HW capabilities: 0x9

kernel log:

[  102.029150]  ret_from_fork+0x22/0x30
[  102.029158] ---[ end trace 15c34e782714f9a3 ]---
[ 3613.603598] amdgpu: Address: 0x7f7149ccc000 already allocated by SVM
[ 3613.610620] show_signal_msg: 27 callbacks suppressed

These is race with deferred actions from previous memory map
changes (e.g. munmap).Flush pending deffered work to avoid such case.
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9c152f54

drm/amdkfd: avoid conflicting address mappings · 71cbfeb3

由 Alex Sierra 提交于 6月 07, 2021

[Why]
Avoid conflict with address ranges mapped by SVM
mechanism that try to be allocated again through
ioctl_alloc in the same process. And viceversa.

[How]
For ioctl_alloc_memory_of_gpu allocations
Check if the address range passed into ioctl memory
alloc does not exist already in the kfd_process
svms->objects interval tree.

For SVM allocations
Look for the address range into the interval tree VA from
the VM inside of each pdds used in a kfd_process.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

71cbfeb3

03 8月, 2021 4 次提交

drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran · a50fe707

由 Eric Huang 提交于 7月 14, 2021

It is to workaround HW bug on other Asics and based on
reverting two commits back:
  drm/amdkfd: Add heavy-weight TLB flush after unmapping
  drm/amdkfd: Add memory sync before TLB flush on unmap
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a50fe707

Revert "Revert "drm/amdkfd: Add memory sync before TLB flush on unmap"" · 626803d1

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 4bba567c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

626803d1

Revert "Revert "drm/amdkfd: Make TLB flush conditional on mapping"" · fce1a7eb

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 7ed9876c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fce1a7eb

Revert "Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping"" · 4a134261

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 430f8e6e.

Revert reason: Issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a134261

29 7月, 2021 3 次提交

Revert "Revert "drm/amdkfd: Add memory sync before TLB flush on unmap"" · b928ecfb

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 4bba567c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b928ecfb

Revert "Revert "drm/amdkfd: Make TLB flush conditional on mapping"" · 8f0e2d5c

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 7ed9876c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8f0e2d5c

Revert "Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping"" · f8753434

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 430f8e6e.

Revert reason: Issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f8753434

13 7月, 2021 6 次提交

Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping" · 5adcd745

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 1098d658.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5adcd745

Revert "drm/amdkfd: Make TLB flush conditional on mapping" · c37387c3

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 31f33243.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c37387c3

Revert "drm/amdkfd: Add memory sync before TLB flush on unmap" · f5cc09ac

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 3be4dca1.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f5cc09ac

Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping" · 430f8e6e

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 1098d658.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

430f8e6e

Revert "drm/amdkfd: Make TLB flush conditional on mapping" · 7ed9876c

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 31f33243.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7ed9876c

Revert "drm/amdkfd: Add memory sync before TLB flush on unmap" · 4bba567c

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 3be4dca1.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4bba567c

16 6月, 2021 1 次提交

drm/amdkfd: Disable SVM per GPU, not per process · 5a75ea56

由 Felix Kuehling 提交于 6月 10, 2021

When some GPUs don't support SVM, don't disabe it for the entire process.
That would be inconsistent with the information the process got from the
topology, which indicates SVM support per GPU.

Instead disable SVM support only for the unsupported GPUs. This is done
by checking any per-device attributes against the bitmap of supported
GPUs. Also use the supported GPU bitmap to initialize access bitmaps for
new SVM address ranges.

Don't handle recoverable page faults from unsupported GPUs. (I don't
think there will be unsupported GPUs that can generate recoverable page
faults. But better safe than sorry.)
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NPhilip Yang <philip.yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5a75ea56

12 6月, 2021 1 次提交

drm/amdkfd: Add memory sync before TLB flush on unmap · 3be4dca1

由 Eric Huang 提交于 6月 10, 2021

It is to fix a failure for SDMA updating PTEs.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3be4dca1

05 6月, 2021 1 次提交

drm/amdkfd: Make TLB flush conditional on mapping · 31f33243

由 Eric Huang 提交于 6月 01, 2021

It is to optimize memory mapping latency, and also aviod
a page fault in a corner case of changing valid PDE into
PTE.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

31f33243

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功