- 24 4月, 2021 1 次提交
-
-
由 Jonathan Kim 提交于
In order to support multi-process debugging, HWS PM4 packet MAP_PROCESS requires an extension of 5 DWORDS to support targeting of per-vmid SPI debug control registers as well as watch points per process. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 21 4月, 2021 1 次提交
-
-
由 Philip Yang 提交于
Register vram memory as MEMORY_DEVICE_PRIVATE type resource, to allocate vram backing pages for page migration. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 4月, 2021 1 次提交
-
-
由 Amber Lin 提交于
Power Management IP is initialized/enabled before KFD init. When a thermal throttling happens before kfd_smi_init is done, calling the KFD SMI update function causes a stack dump by referring a NULL pointer ( smi_clients list). Check if kfd_init is completed before calling the function. Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Reviewed-by: NMukul Joshi <mukul.joshi@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 3月, 2021 1 次提交
-
-
由 Jonathan Kim 提交于
Create dedicated Aldebaran kfd2kgd callbacks to prepare for new per-vmid register instructions for debug trap setting functions and sending host traps. v2: rebase (Alex) Signed-off-by: NJonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 3月, 2021 2 次提交
-
-
由 Jay Cornwall 提交于
Similar to arcturus, but ARCH/ACC VGPRs may now be split unevenly. A new field in SQ_WAVE_GPR_ALLOC tracks the boundary between the two sets of VGPRs. Squash below patches: drm/amdkfd: Use preprocessor for IP-specific trap handler code drm/amdkfd: Fix VGPR restore race in gfx8/gfx9 trap handler drm/amdkfd: Remove duplicated code in gfx9 trap handler drm/amdkfd: Separate ARCH/ACC VGPR restore in trap handler drm/amdkfd: Reverse order of ARCH/ACC VGPR restore in trap handler Signed-off-by: NJay Cornwall <jay.cornwall@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
Add initial KFD support. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 12月, 2020 1 次提交
-
-
由 Harish Kasiviswanathan 提交于
GFX10 CP firmware expects PCIe atomics support. Don't enumerate GFX10 devices on platforms (PCIe v2) that don't support PCIe atomics. Currently, some of the applications like clinfo soft hangs on platforms without PCIe atomics support. Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 30 10月, 2020 1 次提交
-
-
由 Kent Russell 提交于
Since the unique_id is now obtained in amdgpu in smu_late_init, topology misses getting the value during KFD device initialization. To work around this, we use amdgpu_amdkfd_get_unique_id to get the unique_id at read time. Due to this, we can remove unique_id from the kfd_dev structure, since we only need it in the KFD node properties struct Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 10月, 2020 2 次提交
-
-
由 Chengming Gui 提交于
Add KFD support. Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Chengming Gui 提交于
Add KFD support for dimgrey cavefish. v2: rebase (Alex) Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 10月, 2020 1 次提交
-
-
由 Huang Rui 提交于
This patch is to add GFX10 based APU Van Gogh KFD support. We will treat Van Gogh as "dgpu" (bypass IOMMU v2). Signed-off-by: NHuang Rui <ray.huang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 9月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
This will allow us to have different defaults per asic in a future patch. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 9月, 2020 1 次提交
-
-
由 Mukul Joshi 提交于
Move doorbell allocation for a process into kfd device and allocate doorbell space in each PDD during process creation. Currently, KFD manages its own doorbell space but for some devices, amdgpu would allocate the complete doorbell space instead of leaving a chunk of doorbell space for KFD to manage. In a system with mix of such devices, KFD would need to request process doorbell space based on the type of device, either from amdgpu or from its own doorbell space. Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 9月, 2020 1 次提交
-
-
由 YueHaibing 提交于
If KFD_SUPPORT_IOMMU_V2 is not set, gcc warns: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_device.c:121:37: warning: ‘raven_device_info’ defined but not used [-Wunused-const-variable=] static const struct kfd_device_info raven_device_info = { ^~~~~~~~~~~~~~~~~ As Huang Rui suggested, Raven already has the fallback path, so it should be out of IOMMU v2 flag. Suggested-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Acked-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 9月, 2020 1 次提交
-
-
由 Mukul Joshi 提交于
Add support for reporting GPU reset events through SMI. KFD would report both pre and post GPU reset events. Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 8月, 2020 3 次提交
-
-
由 Felix Kuehling 提交于
No need to use a function pointer because the implementation is not ASIC-specific. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
No need to use a function pointer because the implementation is not ASIC-specific. This fixes missing support due to a missing function pointer on Arcturus. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
We still have a few iommu issues which need to address, so force raven as "dgpu" path for the moment. This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled or ACPI CRAT table not correct. v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2. v3: Align with existed thunk, don't change the way of raven, only renoir will use "dgpu" path by default. v4: don't update global ignore_crat in the driver, and revise fallback function if CRAT is broken. v5: refine acpi crat good but no iommu support case, and rename the title. v6: fix the issue of dGPU initialized firstly, just modify the report value in the node_show(). Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Mukul Joshi 提交于
Add support for reporting thermal throttling events through SMI. Also, add a counter to count the number of throttling interrupts observed and report the count in the SMI event message. Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 7月, 2020 3 次提交
-
-
由 Amber Lin 提交于
When the compute is malfunctioning or performance drops, the system admin will use SMI (System Management Interface) tool to monitor/diagnostic what went wrong. This patch provides an event watch interface for the user space to register devices and subscribe events they are interested. After registered, the user can use annoymous file descriptor's poll function with wait-time specified and wait for events to happen. Once an event happens, the user can use read() to retrieve information related to the event. VM fault event is done in this patch. v2: - remove UNREGISTER and add event ENABLE/DISABLE - correct kfifo usage - move event message API to kfd_ioctl.h v3: send the event msg in text than in binary v4: support multiple clients v5: move events enablement from ioctl to fd write v6: sparse fix Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Chengming Gui 提交于
Add callbacks to KGD for navy flounder. Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Chengming Gui 提交于
Add KFD support for Navy Flounder. Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 7月, 2020 1 次提交
-
-
由 Joseph Greathouse 提交于
Add support for GWS in Arcturus, which needs MEC2 firmware #48 or above. Fix the MEC2 version check for Vega 10 GWS support, since Vega 10 firmware adds 0x8000 to the actual firmware revision. We were previously declaring support where it did not exist. Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: NKent Russell <kent.russell@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 7月, 2020 4 次提交
-
-
由 Felix Kuehling 提交于
Use WARN to print messages with backtrace when evictions are triggered. This can help determine the root cause of evictions and help spot driver bugs triggering evictions unintentionally, or help with performance tuning by avoiding conditions that cause evictions in a specific workload. The messages are controlled by a new module parameter that can be changed at runtime: echo Y > /sys/module/amdgpu/parameters/debug_evictions echo N > /sys/module/amdgpu/parameters/debug_evictions Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NPhilip Yang <Philip.Yang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
amdkfd add support for sienna_cichlid virtual function Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jay Cornwall 提交于
- Replace SQC stores with TCP stores - Synchronize with MSG_SAVEWAVE via lgkmcnt - HW_REG_IB_STS is now read-only Signed-off-by: NJay Cornwall <jay.cornwall@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
v4: drop get_tile_config, comment out other callbacks Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 4月, 2020 1 次提交
-
-
由 Joseph Greathouse 提交于
Rather than only enabling GWS support based on the hws_gws_support modparm, also check whether the GPU's HWS firmware supports GWS. Leave the old modparm in place in case users want to test GWS on GPUs not yet in the support list. v2: fix broken syntax from the first patch. Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 02 4月, 2020 1 次提交
-
-
由 Jack Zhang 提交于
Originally, it kfrees the wrong pointer for mem_obj. It would cause memory leak under stress test. Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com> Acked-by: NNirmoy Das <nirmoy.das@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 2月, 2020 1 次提交
-
-
由 Divya Shikre 提交于
Devices from Arcturus onwards will have their UUID exposed to Thunk. Adding neccessary functions to the kernel to propagate the uuid. Signed-off-by: NDivya Shikre <DivyaUday.Shikre@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 2月, 2020 1 次提交
-
-
由 Rajneesh Bhardwaj 提交于
So far the kfd driver implemented same routines for runtime and system wide suspend and resume (s2idle or mem). During system wide suspend the kfd aquires an atomic lock that prevents any more user processes to create queues and interact with kfd driver and amd gpu. This mechanism created problem when amdgpu device is runtime suspended with BACO enabled. Any application that relies on kfd driver fails to load because the driver reports a locked kfd device since gpu is runtime suspended. However, in an ideal case, when gpu is runtime suspended the kfd driver should be able to: - auto resume amdgpu driver whenever a client requests compute service - prevent runtime suspend for amdgpu while kfd is in use This change refactors the amdgpu and amdkfd drivers to support BACO and runtime power management. Reviewed-by: NOak Zeng <oak.zeng@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 1月, 2020 1 次提交
-
-
由 Felix Kuehling 提交于
Move HWS hang detection into unmap_queues_cpsch to catch hangs in all cases. If this happens during a reset, don't schedule another reset because the reset already in progress is expected to take care of it. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Tested-by: NEmily Deng <Emily.Deng@amd.com> Reviewed-by: Nshaoyunl <shaoyun.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 12月, 2019 1 次提交
-
-
由 Philip Yang 提交于
Because queue_work schedule the work on the same CPU the interrupt handler is running, if there are many interrupts pending, it takes longer time for work queue to start, or even worse system will hang. v2: queue work to same NUMA node for better cache locality v3: handle cpumask_next wraparound case Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NEric Huang <JinhuiEric.Huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 14 11月, 2019 1 次提交
-
-
由 yu kuai 提交于
Fixes gcc '-Wunused-but-set-variable' warning: drivers/gpu/drm/amd/amdkfd/kfd_device.c: In function ‘kgd2kfd_post_reset’: drivers/gpu/drm/amd/amdkfd/kfd_device.c:745:11: warning: variable ‘count’ set but not used [-Wunused-but-set-variable] 'count' is never used, so can be removed. Thus 'atomic_dec_return' can be replaced as 'atomic_dec' Fixes: e42051d2 ("drm/amdkfd: Implement GPU reset handlers in KFD") Signed-off-by: Nyu kuai <yukuai3@huawei.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 10月, 2019 1 次提交
-
-
由 Philip Yang 提交于
If device reset/suspend/resume failed for some reason, dqm lock is hold forever and this causes deadlock. Below is a kernel backtrace when application open kfd after suspend/resume failed. Instead of holding dqm lock in pre_reset and releasing dqm lock in post_reset, add dqm->sched_running flag which is modified in dqm->ops.start and dqm->ops.stop. The flag doesn't need lock protection because write/read are all inside dqm lock. For HWS case, map_queues_cpsch and unmap_queues_cpsch checks sched_running flag before sending the updated runlist. v2: For no-HWS case, when device is stopped, don't call load/destroy_mqd for eviction, restore and create queue, and avoid debugfs dump hdqs. Backtrace of dqm lock deadlock: [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more than 120 seconds. [Thu Oct 17 16:43:37 2019] Not tainted 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1 [Thu Oct 17 16:43:37 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Thu Oct 17 16:43:37 2019] rocminfo D 0 3024 2947 0x80000000 [Thu Oct 17 16:43:37 2019] Call Trace: [Thu Oct 17 16:43:37 2019] ? __schedule+0x3d9/0x8a0 [Thu Oct 17 16:43:37 2019] schedule+0x32/0x70 [Thu Oct 17 16:43:37 2019] schedule_preempt_disabled+0xa/0x10 [Thu Oct 17 16:43:37 2019] __mutex_lock.isra.9+0x1e3/0x4e0 [Thu Oct 17 16:43:37 2019] ? __call_srcu+0x264/0x3b0 [Thu Oct 17 16:43:37 2019] ? process_termination_cpsch+0x24/0x2f0 [amdgpu] [Thu Oct 17 16:43:37 2019] process_termination_cpsch+0x24/0x2f0 [amdgpu] [Thu Oct 17 16:43:37 2019] kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu] [Thu Oct 17 16:43:37 2019] kfd_process_notifier_release+0x1be/0x220 [amdgpu] [Thu Oct 17 16:43:37 2019] __mmu_notifier_release+0x3e/0xc0 [Thu Oct 17 16:43:37 2019] exit_mmap+0x160/0x1a0 [Thu Oct 17 16:43:37 2019] ? __handle_mm_fault+0xba3/0x1200 [Thu Oct 17 16:43:37 2019] ? exit_robust_list+0x5a/0x110 [Thu Oct 17 16:43:37 2019] mmput+0x4a/0x120 [Thu Oct 17 16:43:37 2019] do_exit+0x284/0xb20 [Thu Oct 17 16:43:37 2019] ? handle_mm_fault+0xfa/0x200 [Thu Oct 17 16:43:37 2019] do_group_exit+0x3a/0xa0 [Thu Oct 17 16:43:37 2019] __x64_sys_exit_group+0x14/0x20 [Thu Oct 17 16:43:37 2019] do_syscall_64+0x4f/0x100 [Thu Oct 17 16:43:37 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 10月, 2019 2 次提交
-
-
由 Alex Deucher 提交于
Add proper ifdefs around CIK code in kfd setup. Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dan Carpenter 提交于
In the current code if "device_info" is ever NULL then the kernel will Oops so probably || was intended instead of &&. Fixes: e392c887 ("drm/amdkfd: Use array to probe kfd2kgd_calls") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 10月, 2019 3 次提交
-
-
由 Yong Zhao 提交于
This is the same idea as the kfd device info probe and move all the probe control together for easy maintenance. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Harish Kasiviswanathan 提交于
kfd needs drm_device to call into drm_cgroup functions Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
Keep the same use of CHIP_IDs for navi12 in kfd Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-