- 11 4月, 2018 1 次提交
-
-
由 Felix Kuehling 提交于
This prepares for GFXv9 (Vega10), which has 64-bit doorbells. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 24 3月, 2018 2 次提交
-
-
由 Felix Kuehling 提交于
These interfaces allow KGD to stop and resume all GPU user mode queue access to a process address space. This is needed for handling MMU notifiers of userptrs mapped for GPU access in KFD VMs. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Restoring multiple processes concurrently can lead to live-locks where each process prevents the other from validating all its BOs. v2: fix duplicate check of same variable Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 16 3月, 2018 7 次提交
-
-
由 Felix Kuehling 提交于
Simulate large-BAR system by exporting only visible memory. This limits the amount of available VRAM to the size of the BAR, but enables CPU access to VRAM. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
The events page must be accessible in user mode by the GPU and CPU as well as in kernel mode by the CPU. On dGPUs user mode virtual addresses are managed by the Thunk's GPU memory allocation code. Therefore we can't allocate the memory in kernel mode like we do on APUs. But KFD still needs to map the memory for kernel access. To facilitate this, the Thunk provides the buffer handle of the events page to KFD when creating the first event. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
v2: * Fix error handling after kfd_bind_process_to_device in kfd_ioctl_map_memory_to_gpu v3: * Add ioctl to acquire VM from a DRM FD v4: * Return number of successful map/unmap operations in failure cases * Facilitate partial retry after failed map/unmap * Added comments with parameter descriptions to new APIs * Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
On GFX7 the CP does not perform a TC flush when queues are unmapped. To avoid TC eviction from accessing an invalid VMID, flush it explicitly before releasing a VMID. v2: Fix unnecessary list_for_each_entry_safe v3: Moved allocation to kfd_process_device_init_vm Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Also used for cleaning up on process termination. v2: Refactored cleanup on process termination Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Set up the GPUVM aperture for SVM (shared virtual memory) that allows sharing a part of virtual address space between GPUs and CPUs. Report the size of the GPUVM aperture that is supported by KGD accurately. The low part of the GPUVM aperture is reserved for kernel use. This is for kernel-allocated buffers that are only accessed on the GPU: - CWSR trap handler - IB for submitting commands in user-mode context from kernel mode Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Instead of creating all VMs on process creation, create them when a process is bound to a device. This will later allow registering an existing VM from a DRM render node FD at runtime, before the process is bound to the device. This way the render node VM can be used for KFD instead of creating our own redundant VM. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 07 2月, 2018 2 次提交
-
-
由 Felix Kuehling 提交于
When the TTM memory manager in KGD evicts BOs, all user mode queues potentially accessing these BOs must be evicted temporarily. Once user mode queues are evicted, the eviction fence is signaled, allowing the migration of the BO to proceed. A delayed worker is scheduled to restore all the BOs belonging to the evicted process and restart its queues. During suspend/resume of the GPU we also evict all processes to allow KGD to save BOs in system memory, since VRAM will be lost. v2: * Account for eviction when updating of q->is_active in MQD manager Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Create/destroy the GPUVM context during PDD creation/destruction. Get VM page table base and program it during process registration (HWS) or VMID allocation (non-HWS). v2: * Used dev instead of pdd->dev in kfd_flush_tlb Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 10 1月, 2018 1 次提交
-
-
由 Oded Gabbay 提交于
Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
-
- 09 12月, 2017 5 次提交
-
-
由 Felix Kuehling 提交于
Some systems have broken CRAT tables. Add a module option to ignore a CRAT table. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Harish Kasiviswanathan 提交于
Generate and parse VCRAT tables for dGPUs in kfd_topology_add_device. Some information that isn't available in the CRAT table is patched into the topology after parsing. HSA_CAP_DOORBELL_TYPE_1_0 is dependent on the ASIC feature CP_HQD_PQ_CONTROL.SLOT_BASED_WPTR, which was not introduced in VI until Carrizo. Report HSA_CAP_DOORBELL_TYPE_PRE_1_0 on Tonga ASICs. v2: Added #include <linux/pci.h> to kfd_crat.c to make it compile Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NBen Goz <ben.goz@amd.com> Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Signed-off-by: NJay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: NKent Russell <kent.russell@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Currently, the KFD topology information is generated by parsing the CRAT (ACPI) table. However, at present CRAT table is available only for AMD APUs. To support CPUs on systems without a CRAT table, the KFD driver will create a Virtual CRAT (VCRAT) table and then the existing code will parse that table to generate topology. Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Harish Kasiviswanathan 提交于
Modify kfd_topology_enum_kfd_devices(..) function to support non-GPU nodes. The function returned NULL when it encountered non-GPU (say CPU) nodes. This caused kfd_ioctl_create_event and kfd_init_apertures to fail for Intel + Tonga. kfd_topology_enum_kfd_devices will now parse all the nodes and return valid kfd_dev for nodes with GPU. Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on ASIC information. Also allow building KFD without IOMMUv2 support. This is still useful for dGPUs and prepares for enabling KFD on architectures that don't support AMD IOMMUv2. v2: * Centralize IOMMUv2 code to avoid #ifdefs in too many places v3: * Imply AMD_IOMMU_V2 in Kconfig Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian Konig <christian.koenig@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 05 1月, 2018 2 次提交
-
-
由 Felix Kuehling 提交于
On dGPUs don't set ATC addressing bits and use MTYPE_UC for coherent memory. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
This will be needed for most dGPUs. CC: linux-pci@vger.kernel.org Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 28 11月, 2017 4 次提交
-
-
由 Felix Kuehling 提交于
Use a reference counter instead of a lock to prevent process destruction while functions running out of process context are using the kfd_process structure. In many cases these functions don't need the structure to be locked. In the few cases that really do need the process lock, take it explicitly. This helps simplify lock dependencies between the process lock and other locks, particularly amdgpu and mm_struct locks. This will be important when amdgpu calls back to amdkfd for memory evictions. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
This will be used to elliminate the use of the process lock for preventing concurrent process destruction. This will simplify lock dependencies between KFD and KGD. This also simplifies the process destruction in a few ways: * Don't allocate work struct dynamically * Remove unnecessary hack that increments mm reference counter * Remove unnecessary process locking during destruction Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
This commit adds several debugfs entries for kfd: kfd/hqds: dumps all HQDs on all GPUs for KFD-controlled compute and SDMA RLC queues kfd/mqds: dumps all MQDs of all KFD processes on all GPUs kfd/rls: dumps HWS runlists on all GPUs Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Allow HWS to to execute multiple processes on the hardware concurrently. The number of concurrent processes is limited by the number of VMIDs allocated to the HWS. A module parameter can be used for limiting this further or turn it off altogether (mainly for debugging purposes). Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NJay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 15 11月, 2017 1 次提交
-
-
由 Felix Kuehling 提交于
This hardware feature allows the GPU to preempt shader execution in the middle of a compute wave, save the state and restore it later to resume execution. Memory for saving the state is allocated per queue in user mode and the address and size passed to the create_queue ioctl. The size depends on the number of waves that can be in flight simultaneously on a given ASIC. Signed-off-by: NShaoyun.liu <shaoyun.liu@amd.com> Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 02 11月, 2017 3 次提交
-
-
由 Felix Kuehling 提交于
Signed-off-by: Nshaoyun liu <shaoyun.liu@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
These were missed previously when rebasing changes for upstreaming. v2: Remove redundant sched_policy conditions Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
A list of per-process queues is maintained in the kfd_process_queue_manager, so the queues array in kfd_process is redundant and in fact unused. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 28 10月, 2017 7 次提交
-
-
由 Andres Rodriguez 提交于
In systems under heavy load the IH work may experience significant scheduling delays. Under load + system workqueue: Max Latency: 7.023695 ms Avg Latency: 0.263994 ms Under load + high priority workqueue: Max Latency: 1.162568 ms Avg Latency: 0.163213 ms Further work is required to measure the impact of per-cpu settings on IH performance. Signed-off-by: NAndres Rodriguez <andres.rodriguez@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Andres Rodriguez 提交于
Replace our implementation of a lockless ring buffer with the standard linux kernel kfifo. We shouldn't maintain our own version of a standard data structure. Signed-off-by: NAndres Rodriguez <andres.rodriguez@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
This allows increasing the KFD_SIGNAL_EVENT_LIMIT in kfd_ioctl.h without breaking processes built with older kfd_ioctl.h versions. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Signal slots are identical to event IDs. Replace the used_slot_bitmap and events hash table with an IDR to allocate and lookup event IDs and signal slots more efficiently. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
The first event page is always big enough to handle all events. Handling of multiple events pages is not supported by user mode, and not necessary. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Cleaned up the code while resolving some potential bugs and inconsistencies in the process. Clean-ups: * Remove enum kfd_event_wait_result, which duplicates KFD_IOC_EVENT_RESULT definitions * alloc_event_waiters can be called without holding p->event_mutex * Return an error code from copy_signaled_event_data instead of bool * Clean up error handling code paths to minimize duplication in kfd_wait_on_events Fixes: * Consistently return an error code from kfd_wait_on_events and set wait_result to KFD_IOC_WAIT_RESULT_FAIL in all failure cases. * Always call free_waiters while holding p->event_mutex * copy_signaled_event_data might sleep. Don't call it while the task state is TASK_INTERRUPTIBLE. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
The kfd_process doesn't own a reference to the mm_struct, so it can disappear without warning even while the kfd_process still exists. Therefore, avoid dereferencing the kfd_process.mm pointer and make it opaque. Use get_task_mm to get a temporary reference to the mm when it's needed. v2: removed unnecessary WARN_ON Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 27 9月, 2017 5 次提交
-
-
由 Felix Kuehling 提交于
Removed unused num_concurrent_processes. Implemented counting of queues in QPD. This makes counting the queue list repeatedly in several places unnecessary. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
Signed-off-by: NYong Zhao <yong.zhao@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Separate device queue termination from process queue manager termination. Unmap all queues at once instead of one at a time. Unmap device queues before the PASID is unbound, in the kfd_process_iommu_unbind_callback. When resetting wavefronts in non-HWS mode, do it before the VMID is released. Signed-off-by: NBen Goz <ben.goz@amd.com> Signed-off-by: Nshaoyun liu <shaoyun.liu@amd.com> Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
When unmapping the queues from HW scheduler, there are two actions: reset and preempt. So naming the variables with only preempt is inapproriate. For functions such as destroy_queues_cpsch, what they do actually is to unmap the queues on HW scheduler rather than to destroy them. Change the name to reflect that fact. On the other hand, there is already a function called destroy_queue_cpsch() which exactly destroys a queue, and the name is very close to destroy_queues_cpsch(), resulting in confusion. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
PASID management is moving into KGD. Limiting the PASID range to the number of doorbell pages is no longer practical. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-