- 05 6月, 2021 4 次提交
-
-
由 Aaron Liu 提交于
This patch is to add GFX10 based Yellow Carp KFD support. We will bypass IOMMU v2. Signed-off-by: NAaron Liu <aaron.liu@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Eric Huang 提交于
It is to optimize memory mapping latency, and also aviod a page fault in a corner case of changing valid PDE into PTE. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Eric Huang 提交于
It is a part of memory mapping optimization. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Eric Huang 提交于
It is to provide more tlb flush types option for different case scenario. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 5月, 2021 1 次提交
-
-
由 Guenter Roeck 提交于
The first parameter passed to container_of() is the pointer to the work structure passed to the worker and never NULL. The NULL check on the result of container_of() is therefore unnecessary and misleading. Remove it. This change was made automatically with the following Coccinelle script. @@ type t; identifier v; statement s; @@ <+... ( t v = container_of(...); | v = container_of(...); ) ... when != v - if (\( !v \| v == NULL \) ) s ...+> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 5月, 2021 12 次提交
-
-
由 Andrey Grodzovsky 提交于
Use it to call disply code dependent on device->drv_data before it's set to NULL on device unplug v5: Move HW finilization into this callback to prevent MMIO accesses post cpi remove. v7: Split kfd suspend from device exit to expdite HW related stuff to amdgpu_pci_remove v8: Squash previous KFD commit into this commit to avoid compile break. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com
-
由 Dennis Li 提交于
The function kfd_lookup_process_by_pasid will increase the reference count of kfd_process object, its caller should call kfd_unref_process to decrease the reference count. Otherwise resource leakage will happen. Signed-off-by: NDennis Li <Dennis.Li@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Chengming Gui 提交于
Add the function pointer. Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Chengming Gui 提交于
Add KFD support for beige_goby v2: fix asic name typo v3: squash in updates (Alex) v4: squash in needs_atomics fix (Alex) Signed-off-by: NChengming Gui <Jack.Gui@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
Need do a heavy-weight TLB flush to make sure we have no more dirty data in the cache for the unmapped pages. Define enum TLB_FLUSH_TYPE, add flush_type parameter to amdgpu_amdkfd_flush_gpu_tlb_pasid. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
This reverts commit 1704ac8e. After "drm/amdgpu: flush TLB if valid PDE turns into PTE" is checked in, this workaround is not needed. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
Mapping huge page, 2MB aligned address with 2MB size, uses PDE0 as PTE. If previously valid PDE0, PDE0.V=1 and PDE0.P=0 turns into PTE, this requires TLB flush, otherwise page table walker will not read updated PDE0. Change page table update mapping to return table_freed flag to indicate the previously valid PDE may have turned into a PTE if page table is freed. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Now that we found the underlying problem we can re-apply this patch. This reverts commit 6b44b667. v2: rebase on KFD changes Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Tested-by: NNirmoy Das <nirmoy.das@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
MTYPE UC was used for a specific use case that ended up not being implemented. Use NC for better performance for coarse-grained memory where cache coherence during shader execution is not required. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
MTYPE UC was used for a specific use case that ended up not being implemented. Use NC for better performance for coarse-grained memory where cache coherence during shader execution is not required. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dennis Li 提交于
The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison data consumed. Beside that, some applications maybe register SIGBUS signal hander. These applications will handle poison data by themselves, exit or re-create context to re-dispatch works. Signed-off-by: NDennis Li <Dennis.Li@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
If xnack is on, new range is created to recover retry vm fault or created by SVM API calls, set all GPUs have access to the range. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 5月, 2021 10 次提交
-
-
由 Philip Yang 提交于
If migration copy failed because process is killed, or out of VRAM or system memory, pass error code back to caller to handle error gracefully. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Luben Tuikov 提交于
Rename, ras_hw_supported --> ras_hw_enabled, and ras_features --> ras_enabled, to show that ras_enabled is a subset of ras_hw_enabled, which itself is a subset of the ASIC capability. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NJohn Clements <John.Clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Eric Huang 提交于
In NPS4 BIOS we need to find the closest numa node when creating topology io link between cpu and gpu, if PCI driver doesn't set it. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mike Li 提交于
The L1 cache information has been updated and the L2/L3 information has been added. The changes have been made for Vega10 and newer ASICs. There are no changes for the older ASICs before Vega10. Signed-off-by: NMike Li <Tianxinmike.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jonathan Kim 提交于
To account for various PCIe and xGMI setups, check the no atomics settings for a device in relation to every direct peer. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
This function is only used in this source file. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NRodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
To workaround the situation that vm retry fault keep coming after page table update. We are investigating the root cause, but once this issue happens, application will stuck and sometimes have to reboot to recover. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Zhigang Luo 提交于
update kfd_supported_devices to enable Aldebaran virtualization support Signed-off-by: NZhigang Luo <zhigang.luo@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jonathan Kim 提交于
GPUs connected to CPUs over xGMI are bidirectional so set weight by a single hop both ways. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Tested-by: NRamesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jonathan Kim 提交于
Link atomics support over xGMI should be reported independently of PCIe. Do not set NO_ATOMICS flags on devices that support xGMI but that do not have atomics support over PCIe. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Tested-by: NRamesh Errabolu <ramesh.errabolu@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 4月, 2021 9 次提交
-
-
由 Harish Kasiviswanathan 提交于
v2: updated MEC FW version after validating gws with debugger Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: NJoseph Greathouse <Joseph.Greathouse@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
After draining the stale retry fault, or failed to validate the range to recover, have to remove the fault address from fault filter ring, to be able to handle subsequent retry interrupt on same address. Otherwise the retry fault will not be processed to recover until timeout passed. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
Retry fault interrupt maybe pending in IH ring after GPU page table is updated to recover the vm fault, because each page of the range generate retry fault interrupt. There is race if application unmap range to remove and free the range first and then retry fault work restore_pages handle the retry fault interrupt, because range can not be found, this vm fault can not be recovered and report incorrect GPU vm fault to application. Before unmap to remove and free range, drain retry fault interrupt from IH ring1 to ensure no retry fault comes after the range is removed. Drain retry fault interrupt skip the range which is on deferred list to remove, or the range is child range, which is split by unmap, does not add to svms and have interval notifier. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
GPU vm retry fault recover range need retry validation if 1. range is split in parallel by unmap while recover 2. range migrate to system memory and range is updated in system memory while recover Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jonathan Kim 提交于
The plural of 'process' should be 'processes'. Signed-off-by: NJonathan Kim <jonathan.kim@amd.com> Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
Use devm_memunmap_pages instead of memunmap_pages to release pgmap and remove pgmap from device action, to avoid double free pgmap when unloading driver module. Release device memory region if failed to create device memory pages structure. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Colin Ian King 提交于
There is a spelling mistake in a pr_debug message. Fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
amdgpu.h is included in kfd_priv.h Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NJohn Clements <John.Clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Fabio M. De Francesco 提交于
Fixed a kernel-doc error in the documentation of a function. Signed-off-by: NFabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 4月, 2021 4 次提交
-
-
由 Colin Ian King 提交于
The variable r is being initialized with a value that is never read and it is being updated later with a new value. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Colin Ian King 提交于
Currently the call to kfd_process_gpuidx_from_gpuid is returning an int value and this is being assigned to a uint32_t variable gpuidx and this is being checked for a negative error return which is always going to be false. Fix this by making gpuidx an int32_t. This makes gpuidx also type consistent with the use of gpuidx from the callers. Addresses-Coverity: ("Unsigned compared against 0") Fixes: cda0f85b ("drm/amdkfd: refine migration policy with xnack on") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Sierra 提交于
Attribute access value for default ranges is set, based on process xnack on/off. XNACK ON has GPU access attribute for unregistered ranges through page fault. While XNACK OFF has no access attribute for unregistered ranges. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Sierra 提交于
SVM ranges are created for unregistered memory, triggered by page faults. These ranges are migrated/mapped to GPU VRAM memory. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-