- 12 1月, 2023 1 次提交
-
-
由 Mukul Joshi 提交于
commit cf97eb7e upstream. This patch fixes the following kernel warning seen during driver load by correctly initializing the p2plink attr before creating the sysfs file: [ +0.002865] ------------[ cut here ]------------ [ +0.002327] kobject: '(null)' (0000000056260cfb): is not initialized, yet kobject_put() is being called. [ +0.004780] WARNING: CPU: 32 PID: 1006 at lib/kobject.c:718 kobject_put+0xaa/0x1c0 [ +0.001361] Call Trace: [ +0.001234] <TASK> [ +0.001067] kfd_remove_sysfs_node_entry+0x24a/0x2d0 [amdgpu] [ +0.003147] kfd_topology_update_sysfs+0x3d/0x750 [amdgpu] [ +0.002890] kfd_topology_add_device+0xbd7/0xc70 [amdgpu] [ +0.002844] ? lock_release+0x13c/0x2e0 [ +0.001936] ? smu_cmn_send_smc_msg_with_param+0x1e8/0x2d0 [amdgpu] [ +0.003313] ? amdgpu_dpm_get_mclk+0x54/0x60 [amdgpu] [ +0.002703] kgd2kfd_device_init.cold+0x39f/0x4ed [amdgpu] [ +0.002930] amdgpu_amdkfd_device_init+0x13d/0x1f0 [amdgpu] [ +0.002944] amdgpu_device_init.cold+0x1464/0x17b4 [amdgpu] [ +0.002970] ? pci_bus_read_config_word+0x43/0x80 [ +0.002380] amdgpu_driver_load_kms+0x15/0x100 [amdgpu] [ +0.002744] amdgpu_pci_probe+0x147/0x370 [amdgpu] [ +0.002522] local_pci_probe+0x40/0x80 [ +0.001896] work_for_cpu_fn+0x10/0x20 [ +0.001892] process_one_work+0x26e/0x5a0 [ +0.002029] worker_thread+0x1fd/0x3e0 [ +0.001890] ? process_one_work+0x5a0/0x5a0 [ +0.002115] kthread+0xea/0x110 [ +0.001618] ? kthread_complete_and_exit+0x20/0x20 [ +0.002422] ret_from_fork+0x1f/0x30 [ +0.001808] </TASK> [ +0.001103] irq event stamp: 59837 [ +0.001718] hardirqs last enabled at (59849): [<ffffffffb30fab12>] __up_console_sem+0x52/0x60 [ +0.004414] hardirqs last disabled at (59860): [<ffffffffb30faaf7>] __up_console_sem+0x37/0x60 [ +0.004414] softirqs last enabled at (59654): [<ffffffffb307d9c7>] irq_exit_rcu+0xd7/0x130 [ +0.004205] softirqs last disabled at (59649): [<ffffffffb307d9c7>] irq_exit_rcu+0xd7/0x130 [ +0.004203] ---[ end trace 0000000000000000 ]--- Fixes: 0f28cca8 ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 17 8月, 2022 1 次提交
-
-
由 Dan Carpenter 提交于
This code has two bugs. If kfd_topology_device_by_proximity_domain() failed on the first iteration through the loop then "cpu_link" is uninitialized and should not be dereferenced. The second bug is that we cannot dereference a list iterator when it points to the list head. In other words, if we exit the list_for_each_entry() loop exits without hitting a break then "cpu_link" is not a valid pointer and should not be dereferenced. Fix both of these problems by setting "cpu_link" to NULL when it is invalid and non-NULL when it is valid. That makes it easier to test for valid vs invalid. Fixes: 0f28cca8 ("drm/amdkfd: Extend KFD device topology to surface peer-to-peer links") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 6月, 2022 1 次提交
-
-
由 Alex Deucher 提交于
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_topology.c:1542:11: warning: variable 'i' set but not used [-Wunused-but-set-variable] Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 6月, 2022 1 次提交
-
-
由 Ramesh Errabolu 提交于
The field is redundant and does not serve any functional role Signed-off-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 6月, 2022 1 次提交
-
-
由 Ramesh Errabolu 提交于
Extend KFD device topology to surface peer-to-peer links among GPU devices connected over PCIe or xGMI. Enabling HSA_AMD_P2P is REQUIRED to surface peer-to-peer links. Prior to this KFD did not expose to user mode any P2P links or indirect links that go over two or more direct hops. Old versions of the Thunk used to make up their own P2P and indirect links without the information about peer-accessibility and chipset support available to the kernel mode driver. In this patch we expose P2P links in a new sysfs directory to provide more reliable P2P link information to user mode. Old versions of the Thunk will continue to work as before and ignore the new directory. This avoids conflicts between P2P links exposed by KFD and P2P links created by the Thunk itself. New versions of the Thunk will use only the P2P links provided in the new p2p_links directory, if it exists, or fall back to the old code path on older KFDs that don't expose p2p_links. Signed-off-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 5月, 2022 1 次提交
-
-
由 Jonathan Kim 提交于
CPU hive assignment currently assumes when a GPU hive is connected_to_cpu, there is only one hive in the system. Only assign CPUs to the hive if they are explicitly directly connected to the GPU hive to get rid of the need for this assumption. It's more efficient to do this when querying IO links since other non-CRAT info has to be filled in anyways. Also, stop re-assigning the same CPU to the same GPU hive if it has already been done before. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 5月, 2022 1 次提交
-
-
由 Mukul Joshi 提交于
Add initial support for soc21 in KFD compute driver (Mukul) - Add new definition for soc21 device. - Add new file for amdgpu-kfd interface for GFX11 family. - Add new file for queue management, interrupt handling, mqd management for GFX11 family in KFD driver. - Related changes/updates for soc21 device in KFD driver. - Repurpose last 2 entries of SDMA MQD for driver use. v2: Add an optional argument into update queue operation (Mukul) v3: Switch to ip version check, replace kgd_dev with amdgpu_device (Hawking) Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 4月, 2022 2 次提交
-
-
由 Mukul Joshi 提交于
[ 168.544078] ====================================================== [ 168.550309] WARNING: possible circular locking dependency detected [ 168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G E [ 168.562558] ------------------------------------------------------ [ 168.568764] kfdtest/3479 is trying to acquire lock: [ 168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at: kfd_topology_device_by_id+0x16/0x60 [amdgpu] [ 168.583663] but task is already holding lock: [ 168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0xa9/0x180 [ 168.597755] which lock already depends on the new lock. [ 168.605970] the existing dependency chain (in reverse order) is: [ 168.613487] -> #3 (&mm->mmap_lock#2){++++}-{3:3}: [ 168.619700] lock_acquire+0xca/0x2e0 [ 168.623814] down_read+0x3e/0x140 [ 168.627676] do_user_addr_fault+0x40d/0x690 [ 168.632399] exc_page_fault+0x6f/0x270 [ 168.636692] asm_exc_page_fault+0x1e/0x30 [ 168.641249] filldir64+0xc8/0x1e0 [ 168.645115] call_filldir+0x7c/0x110 [ 168.649238] ext4_readdir+0x58e/0x940 [ 168.653442] iterate_dir+0x16a/0x1b0 [ 168.657558] __x64_sys_getdents64+0x83/0x140 [ 168.662375] do_syscall_64+0x35/0x80 [ 168.666492] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 168.672095] -> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}: [ 168.679008] lock_acquire+0xca/0x2e0 [ 168.683122] down_read+0x3e/0x140 [ 168.686982] path_openat+0x5b2/0xa50 [ 168.691095] do_file_open_root+0xfc/0x190 [ 168.695652] file_open_root+0xd8/0x1b0 [ 168.702010] kernel_read_file_from_path_initns+0xc4/0x140 [ 168.709542] _request_firmware+0x2e9/0x5e0 [ 168.715741] request_firmware+0x32/0x50 [ 168.721667] amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu] [ 168.730060] smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu] [ 168.738414] fiji_start_smu+0xcf/0x4e0 [amdgpu] [ 168.745539] pp_dpm_load_fw+0x21/0x30 [amdgpu] [ 168.752503] amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu] [ 168.760698] amdgpu_device_fw_loading+0xb8/0x140 [amdgpu] [ 168.768412] amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu] [ 168.776285] amdgpu_driver_load_kms+0x15/0x120 [amdgpu] [ 168.784034] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu] [ 168.791161] local_pci_probe+0x40/0x80 [ 168.797027] work_for_cpu_fn+0x10/0x20 [ 168.802839] process_one_work+0x273/0x5b0 [ 168.808903] worker_thread+0x20f/0x3d0 [ 168.814700] kthread+0x176/0x1a0 [ 168.819968] ret_from_fork+0x1f/0x30 [ 168.825563] -> #1 (&adev->pm.mutex){+.+.}-{3:3}: [ 168.834721] lock_acquire+0xca/0x2e0 [ 168.840364] __mutex_lock+0xa2/0x930 [ 168.846020] amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu] [ 168.853257] amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu] [ 168.861547] kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu] [ 168.869478] kfd_create_crat_image_virtual+0x447/0x510 [amdgpu] [ 168.877884] kfd_topology_add_device+0x5c8/0x6f0 [amdgpu] [ 168.885556] kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu] [ 168.893347] amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu] [ 168.901177] amdgpu_device_init.cold+0x141b/0x1716 [amdgpu] [ 168.909025] amdgpu_driver_load_kms+0x15/0x120 [amdgpu] [ 168.916458] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu] [ 168.923442] local_pci_probe+0x40/0x80 [ 168.929249] work_for_cpu_fn+0x10/0x20 [ 168.935008] process_one_work+0x273/0x5b0 [ 168.940944] worker_thread+0x20f/0x3d0 [ 168.946623] kthread+0x176/0x1a0 [ 168.951765] ret_from_fork+0x1f/0x30 [ 168.957277] -> #0 (&topology_lock){++++}-{3:3}: [ 168.965993] check_prev_add+0x8f/0xbf0 [ 168.971613] __lock_acquire+0x1299/0x1ca0 [ 168.977485] lock_acquire+0xca/0x2e0 [ 168.982877] down_read+0x3e/0x140 [ 168.987975] kfd_topology_device_by_id+0x16/0x60 [amdgpu] [ 168.995583] kfd_device_by_id+0xa/0x20 [amdgpu] [ 169.002180] kfd_mmap+0x95/0x200 [amdgpu] [ 169.008293] mmap_region+0x337/0x5a0 [ 169.013679] do_mmap+0x3aa/0x540 [ 169.018678] vm_mmap_pgoff+0xdc/0x180 [ 169.024095] ksys_mmap_pgoff+0x186/0x1f0 [ 169.029734] do_syscall_64+0x35/0x80 [ 169.035005] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 169.041754] other info that might help us debug this: [ 169.053276] Chain exists of: &topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2 [ 169.068389] Possible unsafe locking scenario: [ 169.076661] CPU0 CPU1 [ 169.082383] ---- ---- [ 169.088087] lock(&mm->mmap_lock#2); [ 169.092922] lock(&type->i_mutex_dir_key#6); [ 169.100975] lock(&mm->mmap_lock#2); [ 169.108320] lock(&topology_lock); [ 169.112957] *** DEADLOCK *** This commit fixes the deadlock warning by ensuring pm.mutex is not held while holding the topology lock. For this, kfd_local_mem_info is moved into the KFD dev struct and filled during device init. This cached value can then be used instead of querying the value again and again. Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mukul Joshi 提交于
The logic to update the IO links when a KFD device is removed was not correct as it would miss updating the proximity domain values for some nodes where the node_from and node_to both were greater values than the proximity domain value of the KFD device being removed from topology. Fixes: 46d18d51 ("drm/amdkfd: Cleanup IO links during KFD device removal") Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 4月, 2022 1 次提交
-
-
由 Mukul Joshi 提交于
Currently, the IO-links to the device being removed from topology, are not cleared. As a result, there would be dangling links left in the KFD topology. This patch aims to fix the following: 1. Cleanup all IO links to the device being removed. 2. Ensure that node numbering in sysfs and nodes proximity domain values are consistent after the device is removed: a. Adding a device and removing a GPU device are made mutually exclusive. b. The global proximity domain counter is no longer required to be an atomic counter. A normal 32-bit counter can be used instead. 3. Update generation_count to let user-mode know that topology has changed due to device removal. CC: Shuotao Xu <shuotaoxu@microsoft.com> Reviewed-by: NShuotao Xu <shuotaoxu@microsoft.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 2月, 2022 2 次提交
-
-
由 Rajneesh Bhardwaj 提交于
A bunch of errors and warnings are leftover KFD over the years, attempt to fix the errors and most warnings reported by checkpatch tool. Still a few warnings remain which may be false positives so ignore them for now. Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Rajneesh Bhardwaj 提交于
Update the SPDX License header for all the KFD files. Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 02 12月, 2021 2 次提交
-
-
由 Graham Sider 提交于
Initializes kfd->device_info given either asic_type (enum) if GFX version is less than GFX9, or GC IP version if greater. Also takes in vf and the target compiler gfx version. Uses SDMA version to determine num_sdma_queues_per_engine. Convert device_info to a non-pointer member of kfd, change references accordingly. Change unsupported asic condition to only probe f2g, move device_info initialization post-switch. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Graham Sider 提交于
device_info->asic_name and amdgpu_asic_name[adev->asic_type] both provide asic name strings, with the only difference being casing. Remove asic_name from device_info and replace sysfs entry with lowercase amdgpu_asic_name[]. Ensures string is null-terminated so that this doesn't break if dev->node_props.name ever gets set anywhere else. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 11月, 2021 1 次提交
-
-
由 Amber Lin 提交于
Instead of hard coding the number of sdma engines and the number of sdma_xgmi engines in the device_info table, get the number of toal SDMA instances from amdgpu. The first two engines are sdma engines and the rest are sdma-xgmi engines unless the ASIC doesn't support XGMI. v2: add kfd_ prefix to non static function names Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 11月, 2021 6 次提交
-
-
由 Graham Sider 提交于
asic_family was a duplicate of asic_type, both of type amd_asic_type. Replace all instances of device_info->asic_family with adev->asic_type and remove asic_family from device_info. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Graham Sider 提交于
Switch to IP version checking instead of asic_type on various KFD version checks. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Graham Sider 提交于
Converts KFD switch statements to use IP version checking instead of asic_type. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Graham Sider 提交于
These get funcs simply return an adev field. Replace funcs/calls with direct field accesses instead. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Graham Sider 提交于
Remove get_amdgpu_device and other remaining kgd_dev references aside from declaration/kfd struct entry and initialization. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Graham Sider 提交于
Modified definitions: - amdgpu_amdkfd_get_fw_version - amdgpu_amdkfd_get_local_mem_info - amdgpu_amdkfd_get_gpu_clock_counter - amdgpu_amdkfd_get_max_engine_clock_in_mhz - amdgpu_amdkfd_get_cu_info - amdgpu_amdkfd_get_dmabuf_info - amdgpu_amdkfd_get_vram_usage - amdgpu_amdkfd_get_hive_id - amdgpu_amdkfd_get_unique_id - amdgpu_amdkfd_get_mmio_remap_phys_addr - amdgpu_amdkfd_get_num_gws - amdgpu_amdkfd_get_asic_rev_id - amdgpu_amdkfd_get_noretry - amdgpu_amdkfd_get_xgmi_hops_count - amdgpu_amdkfd_get_xgmi_bandwidth_mbytes - amdgpu_amdkfd_get_pcie_bandwidth_mbytes Also replaces kfd_device_by_kgd with kfd_device_by_adev, now searching via adev rather than kgd. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 10月, 2021 1 次提交
-
-
由 Jonathan Kim 提交于
ROCr needs to be able to identify all devices that have direct access to fine grain memory, which should include CPUs that are connected to GPUs over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the CPU is part of the hive. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 8月, 2021 1 次提交
-
-
由 Graham Sider 提交于
Add u32 gfx_target_version field to kfd_node_properties and kfd_device_info. Populate <asic>_device_info structs accordingly and expose to sysfs. This allows eliminating device-ID-based lookup tables in user mode for future ASICs. Signed-off-by: NGraham Sider <Graham.Sider@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 7月, 2021 2 次提交
-
-
由 Tao Zhou 提交于
Add KFD support for cyan_skillfish. v2: whitespace fixes (Alex) Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
Renaming packets to packet_mgr to reflect the real meaning of this variable. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NChristian Konig <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 6月, 2021 1 次提交
-
-
由 Eric Huang 提交于
Fix non-coherent bit of iolink properties flag which always is 0. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 6月, 2021 1 次提交
-
-
由 Felix Kuehling 提交于
When some GPUs don't support SVM, don't disabe it for the entire process. That would be inconsistent with the information the process got from the topology, which indicates SVM support per GPU. Instead disable SVM support only for the unsupported GPUs. This is done by checking any per-device attributes against the bitmap of supported GPUs. Also use the supported GPU bitmap to initialize access bitmaps for new SVM address ranges. Don't handle recoverable page faults from unsupported GPUs. (I don't think there will be unsupported GPUs that can generate recoverable page faults. But better safe than sorry.) Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NPhilip Yang <philip.yang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 6月, 2021 1 次提交
-
-
由 Aaron Liu 提交于
This patch is to add GFX10 based Yellow Carp KFD support. We will bypass IOMMU v2. Signed-off-by: NAaron Liu <aaron.liu@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 5月, 2021 1 次提交
-
-
由 Chengming Gui 提交于
Add KFD support for beige_goby v2: fix asic name typo v3: squash in updates (Alex) v4: squash in needs_atomics fix (Alex) Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 5月, 2021 3 次提交
-
-
由 Luben Tuikov 提交于
Rename, ras_hw_supported --> ras_hw_enabled, and ras_features --> ras_enabled, to show that ras_enabled is a subset of ras_hw_enabled, which itself is a subset of the ASIC capability. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NJohn Clements <John.Clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jonathan Kim 提交于
To account for various PCIe and xGMI setups, check the no atomics settings for a device in relation to every direct peer. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jonathan Kim 提交于
Link atomics support over xGMI should be reported independently of PCIe. Do not set NO_ATOMICS flags on devices that support xGMI but that do not have atomics support over PCIe. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Tested-by: NRamesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 21 4月, 2021 1 次提交
-
-
由 Philip Yang 提交于
SVMAPISupported property added to HSA_CAPABILITY, the value match HSA_CAPABILITY defined in Thunk spec: SVMAPISupported: it will not be supported on older kernels that don't have HMM or on systems with GFXv8 or older GPUs without support for 48-bit virtual addresses. CoherentHostAccess property added to HSA_MEMORYPROPERTY, the value match HSA_MEMORYPROPERTY defined in Thunk spec: CoherentHostAccess: whether or not device memory can be coherently accessed by the host CPU. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 3月, 2021 1 次提交
-
-
由 Yong Zhao 提交于
Add initial KFD support. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 2月, 2021 1 次提交
-
-
由 Kent Russell 提交于
Instead of caching the value during amdgpu_device_init, just call the function directly. This avoids issues where the unique_id hasn't been saved by the time that KFD's topology snapshot is done (e.g. Arcturus). KFD's topology information from the amdgpu_device was initially cached at KFD initialization due to amdkfd and amdgpu being separate modules. Now that they are combined together, we can directly call the functions that we need and avoid this unnecessary duplication and complexity. As a side-effect of this change, we also remove unique_id=0 for CPUs, which is obviously not unique. v2: Drop previous patch printing unique_id in hex Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 30 10月, 2020 1 次提交
-
-
由 Kent Russell 提交于
Since the unique_id is now obtained in amdgpu in smu_late_init, topology misses getting the value during KFD device initialization. To work around this, we use amdgpu_amdkfd_get_unique_id to get the unique_id at read time. Due to this, we can remove unique_id from the kfd_dev structure, since we only need it in the KFD node properties struct Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 10月, 2020 1 次提交
-
-
由 Chengming Gui 提交于
Add KFD support for dimgrey cavefish. v2: rebase (Alex) Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 10月, 2020 1 次提交
-
-
由 Huang Rui 提交于
This patch is to add GFX10 based APU Van Gogh KFD support. We will treat Van Gogh as "dgpu" (bypass IOMMU v2). Signed-off-by: NHuang Rui <ray.huang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 8月, 2020 2 次提交
-
-
由 Stanley.Yang 提交于
The ctx->features are new RAS implementation which is only available for Vega20 and onwards, it is not available for vega10, vega10 should follow legacy ECC implementation. Changed from V1: wrap function to initialize kfd node properties Changed from V2: remove wrap function and SDMA SRAM ECC check Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
We still have a few iommu issues which need to address, so force raven as "dgpu" path for the moment. This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled or ACPI CRAT table not correct. v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2. v3: Align with existed thunk, don't change the way of raven, only renoir will use "dgpu" path by default. v4: don't update global ignore_crat in the driver, and revise fallback function if CRAT is broken. v5: refine acpi crat good but no iommu support case, and rename the title. v6: fix the issue of dGPU initialized firstly, just modify the report value in the node_show(). Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-