- 20 5月, 2021 6 次提交
-
-
由 Felix Kuehling 提交于
Add BO-type specific helpers functions to DMA-map and unmap kfd_mem_attachments. Implement this functionality for userptrs by creating one SG BO per GPU and filling it with a DMA mapping of the pages from the original mem->bo. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
Do AQL queue double-mapping with a single attach call. That will make it easier to create per-GPU BOs later, to be shared between the two BO VA mappings on the same GPU. Freeing the attachments is not necessary if map_to_gpu fails. These will be cleaned up when the kdg_mem object is destroyed in amdgpu_amdkfd_gpuvm_free_memory_of_gpu. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
For now they all reference the same BO. For correct DMA mappings they will refer to different BOs per-GPU. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
This name is more fitting, especially for the changes coming next to support multi-GPU systems with proper DMA mappings. Cleaned up the code and renamed some related functions and variables to improve readability. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
MTYPE UC was used for a specific use case that ended up not being implemented. Use NC for better performance for coarse-grained memory where cache coherence during shader execution is not required. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
MTYPE UC was used for a specific use case that ended up not being implemented. Use NC for better performance for coarse-grained memory where cache coherence during shader execution is not required. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 21 4月, 2021 6 次提交
-
-
由 Alex Sierra 提交于
Add CREATE_SVM_BO define bit for SVM BOs. Another define flag was moved to concentrate these KFD type flags in one include file. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Sierra 提交于
[why] As part of the SVM functionality, the eviction mechanism used for SVM_BOs is different. This mechanism uses one eviction fence per prange, instead of one fence per kfd_process. [how] A svm_bo reference to amdgpu_amdkfd_fence to allow differentiate between SVM_BO or regular BO evictions. This also include modifications to set the reference at the fence creation call. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
HMM migration alloc sizeof(struct page) on system memory for each VRAM page, it is 1GB system memory reserved for 64GB VRAM. To avoid application OOM, increase system memory used size based on VRAM size of all GPUs, then application alloc memory will fail if system memory usage reach the limit. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
Use amdgpu_vm_bo_update_mapping to update GPU page table to map or unmap svm range system memory pages address to GPUs. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
DRM render node file handles are used for CPU mapping of BOs using mmap by the Thunk. It uses the DRM render node of the GPU where the BO was allocated. DRM allows mmap access automatically when it creates a GEM handle for a BO. KFD BOs don't have GEM handles, so KFD needs to manage access manually. Use drm_vma_node_allow to allow user mode to mmap BOs allocated with kfd_ioctl_alloc_memory_of_gpu through the DRM render node that was used in the kfd_ioctl_acquire_vm call for the same GPU. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NPhilip Yang <philip.yang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu needs the drm_priv to allow mmap to access the BO through the corresponding file descriptor. The VM can also be extracted from drm_priv, so drm_priv can replace the vm parameter in the kfd2kgd interface. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NPhilip Yang <philip.yang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 4月, 2021 2 次提交
-
-
由 Felix Kuehling 提交于
ROCm user mode has acquired VMs from DRM file descriptors for as long as it supported the upstream KFD. Legacy code to support older versions of ROCm is not needed any more. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NPhilip Yang <Philip.Yang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Eric Huang 提交于
Due to changes of HW memory model, we need to change Aldebaran MTYPEs to meet HW changes. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 3月, 2021 3 次提交
-
-
由 Eric Huang 提交于
The flag is only applied on fine-grained memory. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
With the current kfd memory accounting scheme, kfd applications can use up to 15/16 of total system memory. For system which has small total system memory size it leaves small system memory for OS. For example, if the system has totally 16GB of system memory, this scheme leave OS and non-kfd applications only 1GB of system memory. In many cases, this leads to OOM killer. This patch changed the KFD system memory accounting scheme. 15/16 of free system memory when kfd driver load. This deduct the system memory that OS already use. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Suggested-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Eric Huang 提交于
To support new cache coherence HW on A+A platform mainly in KFD. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 3月, 2021 1 次提交
-
-
由 Yong Zhao 提交于
Add initial KFD support. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 2月, 2021 1 次提交
-
-
由 Christian König 提交于
This is just another feature which is only used by VMWGFX, so move it into the driver instead. I've tried to add the accounting sysfs file to the kobject of the drm minor, but I'm not 100% sure if this works as expected. v2: fix typo in KFD and avoid 64bit divide v3: fix init order in VMWGFX v4: use pdev sysfs reference instead of drm Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: Zack Rusin <zackr@vmware.com> (v3) Tested-by: NNirmoy Das <nirmoy.das@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210208133226.36955-2-christian.koenig@amd.com
-
- 03 2月, 2021 1 次提交
-
-
由 Huang Rui 提交于
In drm_gem_object_free, it will call funcs of drm buffer obj. So kfd_alloc should use amdgpu_gem_object_create instead of amdgpu_bo_create to initialize the funcs as amdgpu_gem_object_funcs. [ 396.231390] amdgpu: Release VA 0x7f76b4ada000 - 0x7f76b4add000 [ 396.231394] amdgpu: remove VA 0x7f76b4ada000 - 0x7f76b4add000 in entry 0000000085c24a47 [ 396.231408] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 396.231445] #PF: supervisor read access in kernel mode [ 396.231466] #PF: error_code(0x0000) - not-present page [ 396.231484] PGD 0 P4D 0 [ 396.231495] Oops: 0000 [#1] SMP NOPTI [ 396.231509] CPU: 7 PID: 1352 Comm: clinfo Tainted: G OE 5.11.0-rc2-custom #1 [ 396.231537] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS WCD0401N_Weekly_20_04_0 04/01/2020 [ 396.231563] RIP: 0010:drm_gem_object_free+0xc/0x22 [drm] [ 396.231606] Code: eb ec 48 89 c3 eb e7 0f 1f 44 00 00 55 48 89 e5 48 8b bf 00 06 00 00 e8 72 0d 01 00 5d c3 0f 1f 44 00 00 48 8b 87 40 01 00 00 <48> 8b 00 48 85 c0 74 0b 55 48 89 e5 e8 54 37 7c db 5d c3 0f 0b c3 [ 396.231666] RSP: 0018:ffffb4704177fcf8 EFLAGS: 00010246 [ 396.231686] RAX: 0000000000000000 RBX: ffff993a0d0cc400 RCX: 0000000000003113 [ 396.231711] RDX: 0000000000000001 RSI: e9cda7a5d0791c6d RDI: ffff993a333a9058 [ 396.231736] RBP: ffffb4704177fdd0 R08: ffff993a03855858 R09: 0000000000000000 [ 396.231761] R10: ffff993a0d1f7158 R11: 0000000000000001 R12: 0000000000000000 [ 396.231785] R13: ffff993a0d0cc428 R14: 0000000000003000 R15: ffffb4704177fde0 [ 396.231811] FS: 00007f76b5730740(0000) GS:ffff993b275c0000(0000) knlGS:0000000000000000 [ 396.231840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 396.231860] CR2: 0000000000000000 CR3: 000000016d2e2000 CR4: 0000000000350ee0 [ 396.231885] Call Trace: [ 396.231897] ? amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x24c/0x25f [amdgpu] [ 396.232056] ? __dynamic_dev_dbg+0xcd/0x100 [ 396.232076] kfd_ioctl_free_memory_of_gpu+0x91/0x102 [amdgpu] [ 396.232214] kfd_ioctl+0x211/0x35b [amdgpu] [ 396.232341] ? kfd_ioctl_get_queue_wave_state+0x52/0x52 [amdgpu] Fixes: 246cb7e4 ("drm/amdgpu: Introduce GEM object functions") Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Tested-by: NChangfeng <changzhu@amd.com> Signed-off-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 02 2月, 2021 1 次提交
-
-
由 Huang Rui 提交于
In drm_gem_object_free, it will call funcs of drm buffer obj. So kfd_alloc should use amdgpu_gem_object_create instead of amdgpu_bo_create to initialize the funcs as amdgpu_gem_object_funcs. [ 396.231390] amdgpu: Release VA 0x7f76b4ada000 - 0x7f76b4add000 [ 396.231394] amdgpu: remove VA 0x7f76b4ada000 - 0x7f76b4add000 in entry 0000000085c24a47 [ 396.231408] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 396.231445] #PF: supervisor read access in kernel mode [ 396.231466] #PF: error_code(0x0000) - not-present page [ 396.231484] PGD 0 P4D 0 [ 396.231495] Oops: 0000 [#1] SMP NOPTI [ 396.231509] CPU: 7 PID: 1352 Comm: clinfo Tainted: G OE 5.11.0-rc2-custom #1 [ 396.231537] Hardware name: AMD Celadon-RN/Celadon-RN, BIOS WCD0401N_Weekly_20_04_0 04/01/2020 [ 396.231563] RIP: 0010:drm_gem_object_free+0xc/0x22 [drm] [ 396.231606] Code: eb ec 48 89 c3 eb e7 0f 1f 44 00 00 55 48 89 e5 48 8b bf 00 06 00 00 e8 72 0d 01 00 5d c3 0f 1f 44 00 00 48 8b 87 40 01 00 00 <48> 8b 00 48 85 c0 74 0b 55 48 89 e5 e8 54 37 7c db 5d c3 0f 0b c3 [ 396.231666] RSP: 0018:ffffb4704177fcf8 EFLAGS: 00010246 [ 396.231686] RAX: 0000000000000000 RBX: ffff993a0d0cc400 RCX: 0000000000003113 [ 396.231711] RDX: 0000000000000001 RSI: e9cda7a5d0791c6d RDI: ffff993a333a9058 [ 396.231736] RBP: ffffb4704177fdd0 R08: ffff993a03855858 R09: 0000000000000000 [ 396.231761] R10: ffff993a0d1f7158 R11: 0000000000000001 R12: 0000000000000000 [ 396.231785] R13: ffff993a0d0cc428 R14: 0000000000003000 R15: ffffb4704177fde0 [ 396.231811] FS: 00007f76b5730740(0000) GS:ffff993b275c0000(0000) knlGS:0000000000000000 [ 396.231840] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 396.231860] CR2: 0000000000000000 CR3: 000000016d2e2000 CR4: 0000000000350ee0 [ 396.231885] Call Trace: [ 396.231897] ? amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x24c/0x25f [amdgpu] [ 396.232056] ? __dynamic_dev_dbg+0xcd/0x100 [ 396.232076] kfd_ioctl_free_memory_of_gpu+0x91/0x102 [amdgpu] [ 396.232214] kfd_ioctl+0x211/0x35b [amdgpu] [ 396.232341] ? kfd_ioctl_get_queue_wave_state+0x52/0x52 [amdgpu] Fixes: 246cb7e4 ("drm/amdgpu: Introduce GEM object functions") Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Tested-by: NChangfeng <changzhu@amd.com> Signed-off-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 12月, 2020 1 次提交
-
-
由 Yifan Zhang 提交于
it could also be insufficient vram that makes amdgpu_amdkfd_reserve_mem_limit fail. Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 12月, 2020 1 次提交
-
-
由 Daniel Vetter 提交于
I guess Christian didn't compile test amdkfd. Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Fixes: e11bfb99 ("drm/ttm: cleanup BO size handling v3") Cc: Christian König <christian.koenig@amd.com> Cc: Huang Rui <ray.huang@amd.com> (v1) Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: amd-gfx@lists.freedesktop.org Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20201214191725.3899147-1-daniel.vetter@ffwll.ch
-
- 02 12月, 2020 1 次提交
-
-
由 Philip Yang 提交于
If vram is used up, display allocate vram evict the KFD BOs to system memory. KFD schedule restore work to restore BOs back to vram. If display BOs are pinned in vram, KFD restore work will keep retry, and may never success. If restore BO back to vram failed, keep the BO in system memory to prevent endless retry restore, and GPU mapping will update to system memory. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 11月, 2020 1 次提交
-
-
由 Deepak R Varma 提交于
General code indentation and alignment changes such as replace spaces by tabs or align function arguments as per the coding style guidelines. The patch corrects issues for various amdgpu_*.c files for this driver. Issue reported by checkpatch script. Signed-off-by: NDeepak R Varma <mh12gx2825@gmail.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 10月, 2020 1 次提交
-
-
由 Sumera Priyadarsini 提交于
Bool initialisation should use 'true' and 'false' values instead of 0 and 1. Modify amdgpu_amdkfd_gpuvm.c to initialise variable is_imported to false instead of 0. Issue found with Coccinelle. Signed-off-by: NSumera Priyadarsini <sylphrenadin@gmail.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 10月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Make use of the new struct_size() helper instead of the offsetof() idiom. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 9月, 2020 1 次提交
-
-
由 Christian König 提交于
Stop using TTM_PL_FLAG_NO_EVICT. Signed-off-by: NChristian König <christian.koenig@amd.com> Tested-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NDave Airlie <airlied@redhat.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Link: https://patchwork.freedesktop.org/patch/391617/?series=81973&rev=1
-
- 18 9月, 2020 1 次提交
-
-
由 Fenghua Yu 提交于
PASID is defined as a few different types in iommu including "int", "u32", and "unsigned int". To be consistent and to match with uapi definitions, define PASID and its variations (e.g. max PASID) as "u32". "u32" is also shorter and a little more explicit than "unsigned int". No PASID type change in uapi although it defines PASID as __u64 in some places. Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NJoerg Roedel <jroedel@suse.de> Link: https://lkml.kernel.org/r/1600187413-163670-2-git-send-email-fenghua.yu@intel.com
-
- 25 8月, 2020 1 次提交
-
-
由 Luben Tuikov 提交于
Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 8月, 2020 1 次提交
-
-
由 Christian König 提交于
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1a. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 8月, 2020 2 次提交
-
-
由 Christian König 提交于
We need to allocate that manually now. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NMichel Dänzer <mdaenzer@redhat.com> Tested-by: NMichel Dänzer <mdaenzer@redhat.com> Link: https://patchwork.freedesktop.org/patch/384330/
-
由 Philip Yang 提交于
If multiple process share system memory through /dev/shm, KFD allocate memory should not fail if it reaches the system memory limit because one copy of physical system memory are shared by multiple process. Add module parameter no_system_mem_limit to provide user option to disable system memory limit check at runtime using sysfs or during driver module init using kernel boot argument. By default the system memory limit is on. Print out debug message to warn user if KFD allocate memory failed because system memory reaches limit. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 7月, 2020 1 次提交
-
-
由 Ahmed S. Darwish 提交于
A sequence counter write side critical section must be protected by some form of locking to serialize writers. If the serialization primitive is not disabling preemption implicitly, preemption has to be explicitly disabled before entering the sequence counter write side critical section. The dma-buf reservation subsystem uses plain sequence counters to manage updates to reservations. Writer serialization is accomplished through a wound/wait mutex. Acquiring a wound/wait mutex does not disable preemption, so this needs to be done manually before and after the write side critical section. Use the newly-added seqcount_ww_mutex_t instead: - It associates the ww_mutex with the sequence count, which enables lockdep to validate that the write side critical section is properly serialized. - It removes the need to explicitly add preempt_disable/enable() around the write side critical section because the write_begin/end() functions for this new data type automatically do this. If lockdep is disabled this ww_mutex lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: NAhmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: https://lkml.kernel.org/r/20200720155530.1173732-13-a.darwish@linutronix.de
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 7月, 2020 1 次提交
-
-
由 Christian König 提交于
According to Marek a pipeline sync should be inserted for implicit syncs well. v2: bump the driver version Signed-off-by: NChristian König <christian.koenig@amd.com> Tested-by: NMarek Olšák <marek.olsak@amd.com> Signed-off-by: NMarek Olšák <marek.olsak@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 6月, 2020 1 次提交
-
-
由 Michel Lespinasse 提交于
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 5月, 2020 1 次提交
-
-
由 Philip Yang 提交于
In free memory of gpu path, remove bo from validate_list to make sure restore worker don't access the BO any more, then unregister bo MMU interval notifier. Otherwise, the restore worker will crash in the middle of validating BO user pages if MMU interval notifer is gone. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 5月, 2020 1 次提交
-
-
由 Felix Kuehling 提交于
Releasing the AMDGPU BO ref directly leads to problems when BOs were exported as DMA bufs. Releasing the GEM reference makes sure that the AMDGPU/TTM BO is not freed too early. Also take a GEM reference when importing BOs from DMABufs to keep references to imported BOs balances properly. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Tested-by: NAlex Sierra <alex.sierra@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Sierra <alex.sierra@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 5月, 2020 1 次提交
-
-
由 Felix Kuehling 提交于
Releasing the AMDGPU BO ref directly leads to problems when BOs were exported as DMA bufs. Releasing the GEM reference makes sure that the AMDGPU/TTM BO is not freed too early. Also take a GEM reference when importing BOs from DMABufs to keep references to imported BOs balances properly. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Tested-by: NAlex Sierra <alex.sierra@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Sierra <alex.sierra@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-