- 14 7月, 2018 1 次提交
-
-
由 Yong Zhao 提交于
On Raven there is only one SDMA engine instead of previously assumed two, so we need to adapt our code to this new scenario. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 12 7月, 2018 6 次提交
-
-
由 Yong Zhao 提交于
This will make reading code much easier. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
This avoids triggering a GPU reset or otherwise changing the HW state. Instead KFD will hang, which allows HW debugging tools to analyze the problem. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Shaoyun Liu 提交于
Signed-off-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Shaoyun Liu 提交于
The reset will be performed in a new hw_exception work thread to handle HWS hang without blocking the thread that detected the hang. Signed-off-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 shaoyunl 提交于
1. Pre-GFX9 the amdgpu ISR saves the vm-fault status and address per per-vmid. amdkfd needs to get the information from amdgpu through the new get_vm_fault_info interface. On GFX9 and later, all the required information is in the IH ring 2. amdkfd unmaps all queues from the faulting process and create new run-list without the guilty process 3. amdkfd notifies the runtime of the vm fault trap via EVENT_TYPE_MEMORY Signed-off-by: Nshaoyun liu <shaoyun.liu@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
This is needed to prevent deadlocks when MMU notifiers run in reclaim-FS context and take the DQM lock for userptr evictions. Previously this was done by making all memory allocations under DQM locks GFP_NOIO. This is error prone. Using memalloc_nofs_save/restore will reliably affect all memory allocations anywhere in the kernel while the DQM lock is held. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 02 5月, 2018 1 次提交
-
-
由 Oak Zeng 提交于
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 11 4月, 2018 3 次提交
-
-
由 Felix Kuehling 提交于
Signed-off-by: NJohn Bridgman <john.bridgman@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
This is in preparation for GFXv9 (Vega10) which uses incompatible PM4 packet formats from previous ASIC generations. Signed-off-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Allocate doorbells according to the doorbell routing information on SOC15 ASICs (Vega10 and later). On older ASICs we continue to use the queue_id as the doorbell ID to maintain compatibility with the Thunk. Signed-off-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 24 3月, 2018 1 次提交
-
-
由 Felix Kuehling 提交于
Deallocate SDMA queues during abnormal process termination and when queue creation fails after the SDMA allocation. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 16 3月, 2018 1 次提交
-
-
由 Felix Kuehling 提交于
On GFX7 the CP does not perform a TC flush when queues are unmapped. To avoid TC eviction from accessing an invalid VMID, flush it explicitly before releasing a VMID. v2: Fix unnecessary list_for_each_entry_safe v3: Moved allocation to kfd_process_device_init_vm Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 07 2月, 2018 3 次提交
-
-
由 Felix Kuehling 提交于
When the TTM memory manager in KGD evicts BOs, all user mode queues potentially accessing these BOs must be evicted temporarily. Once user mode queues are evicted, the eviction fence is signaled, allowing the migration of the BO to proceed. A delayed worker is scheduled to restore all the BOs belonging to the evicted process and restart its queues. During suspend/resume of the GPU we also evict all processes to allow KGD to save BOs in system memory, since VRAM will be lost. v2: * Account for eviction when updating of q->is_active in MQD manager Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Create/destroy the GPUVM context during PDD creation/destruction. Get VM page table base and program it during process registration (HWS) or VMID allocation (non-HWS). v2: * Used dev instead of pdd->dev in kfd_flush_tlb Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Harish Kasiviswanathan 提交于
Unaligned atomic operations can cause problems on some CPU architectures. Use simpler bitmask operations instead. Atomic bit manipulations are not necessary since dqm->lock is held during these operations. Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 03 1月, 2018 2 次提交
-
-
由 Yong Zhao 提交于
When destroying an inactive queue, we don't need to call execute_queues_cpsch. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Reviewed-by: NOak Zeng <oak.zeng@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
Signed-off-by: NYong Zhao <yong.zhao@amd.com> Reviewed-by: NOak Zeng <oak.zeng@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 05 1月, 2018 2 次提交
-
-
由 Felix Kuehling 提交于
GFXv7 and v8 dGPUs use a different addressing mode for KFD compared to APUs (GPUVM64 vs HSA64). And dGPUs don't support MTYPE_CC. They use MTYPE_UC instead for memory that requires coherency. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Some dGPUs don't support HWS. Allow them to use a per-device sched_policy that may be different from the global default. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 28 11月, 2017 1 次提交
-
-
由 Felix Kuehling 提交于
This commit adds several debugfs entries for kfd: kfd/hqds: dumps all HQDs on all GPUs for KFD-controlled compute and SDMA RLC queues kfd/mqds: dumps all MQDs of all KFD processes on all GPUs kfd/rls: dumps HWS runlists on all GPUs Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 25 11月, 2017 1 次提交
-
-
由 Yong Zhao 提交于
Signed-off-by: NYong Zhao <yong.zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 15 11月, 2017 2 次提交
-
-
由 Felix Kuehling 提交于
A second-level user mode trap handler can be installed. The CWSR trap handler jumps to the secondary trap handler conditionally for any conditions not handled by it. This can be used e.g. for debugging or catching math exceptions. When CWSR is disabled, the user mode trap handler is installed as first level trap handler. Signed-off-by: NShaoyun.liu <shaoyun.liu@amd.com> Signed-off-by: NJay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
This hardware feature allows the GPU to preempt shader execution in the middle of a compute wave, save the state and restore it later to resume execution. Memory for saving the state is allocated per queue in user mode and the address and size passed to the create_queue ioctl. The size depends on the number of waves that can be in flight simultaneously on a given ASIC. Signed-off-by: NShaoyun.liu <shaoyun.liu@amd.com> Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 02 11月, 2017 3 次提交
-
-
由 Felix Kuehling 提交于
These were missed previously when rebasing changes for upstreaming. v2: Remove redundant sched_policy conditions Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
map_queues_cpsch uses the queue_count to decide whether to upload a new runlist. So update the counter before calling it. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
Remove empty initialize function. Rename register_process to update_qpd to avoid confusion with the non-ASIC-specific register_process. Shorten ops_asic_specific to asic_ops. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 27 9月, 2017 5 次提交
-
-
由 shaoyunl 提交于
HWS does not support over-subscription and the scheduler can not internally modify the engine. Driver needs to program the correct engine ID. Fix the queue and engine selection to create queues on alternating SDMA engines. This allows concurrent bi-directional DMA transfers in a process that creates two SDMA queues. Signed-off-by: Nshaoyun liu <shaoyun.liu@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Removed unused num_concurrent_processes. Implemented counting of queues in QPD. This makes counting the queue list repeatedly in several places unnecessary. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Separate device queue termination from process queue manager termination. Unmap all queues at once instead of one at a time. Unmap device queues before the PASID is unbound, in the kfd_process_iommu_unbind_callback. When resetting wavefronts in non-HWS mode, do it before the VMID is released. Signed-off-by: NBen Goz <ben.goz@amd.com> Signed-off-by: Nshaoyun liu <shaoyun.liu@amd.com> Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
v2: Make queue mapping interfaces more consistent by passing unmap filter parameters directly to execute_queues_cpsch, same as unmap_queues_cpsch. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
When a queue is mapped, the MQD is owned by the FW. The FW overwrites the MQD on the next unmap operation. Therefore the queue must be unmapped before updating the MQD. For the non-HWS case, also fix disabling of queues and creation of queues in disabled state. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 08 10月, 2017 2 次提交
-
-
由 Yong Zhao 提交于
Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 27 9月, 2017 1 次提交
-
-
由 Yong Zhao 提交于
When unmapping the queues from HW scheduler, there are two actions: reset and preempt. So naming the variables with only preempt is inapproriate. For functions such as destroy_queues_cpsch, what they do actually is to unmap the queues on HW scheduler rather than to destroy them. Change the name to reflect that fact. On the other hand, there is already a function called destroy_queue_cpsch() which exactly destroys a queue, and the name is very close to destroy_queues_cpsch(), resulting in confusion. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
- 21 9月, 2017 5 次提交
-
-
由 Yong Zhao 提交于
Several functions in DQM are shared between cpsch and nocpsch code. Remove the misleading _nocpsch suffix from their names. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
There are already CHIP_* definitions under amd_shared.h file on amdgpu side, so KFD should reuse them rather than defining new ones. Using enum for asic type requires default cases on switch statements to prevent compiler warnings. WARN on unsupported ASICs. It should never get there because KFD should not be initialized on unsupported devices. v2: Replace BUG() with WARN and error return Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
The hard-coded values related to VMID were removed in KFD, as those values can be calculated in the KFD initialization function. v2: remove unnecessary local variable Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Felix Kuehling 提交于
Adjust latencies and timeouts for dequeueing with HWS and consolidate them in one place. Make them longer to allow long running waves to complete without causing a timeout. The timeout is twice as long as the latency plus some buffer to make sure we don't detect a timeout prematurely. Change timeouts for dequeueing HQDs without HWS. KFD_UNMAP_LATENCY is more consistent with what the HWS does for user queues. Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-
由 Yong Zhao 提交于
The timeout in milliseconds should not be regarded as jiffies. This commit fixed that. v2: - use msecs_to_jiffies - change timeout_ms parameter to unsigned int to match msecs_to_jiffies Signed-off-by: NYong Zhao <yong.zhao@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NOded Gabbay <oded.gabbay@gmail.com>
-