- 21 9月, 2022 2 次提交
-
-
由 Christian König 提交于
Allows submitting jobs as gang which needs to run on multiple engines at the same time. Basic idea is that we have a global gang submit fence representing when the gang leader is finally pushed to run on the hardware last. Jobs submitted as gang are never re-submitted in case of a GPU reset since this won't work and will just deadlock the hardware immediately again. v2: fix logic inversion, improve documentation, fix rcu Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Similar to what we did for VCN3 use the job instead of the parser entity. Cleanup the coding style quite a bit as well. v2: merge improved application check into this patch v3: finally fix the check v4: limit to the correct engine Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 9月, 2022 13 次提交
-
-
由 Christian König 提交于
This reverts commit 250195ff. The job should now be initialized when we reach the parser functions. v2: merge improved application check into this patch v3: back to the original test, but use the right ring Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Initialize the entity for the CS and scheduler job much earlier. v2: fix job initialisation order and use correct scheduler instance Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Return early on success and so remove all those "if (r)" in the error path. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Cleanup the coding style and function names to represent the data they process for pass2 as well. Go over the chunks only twice now instead of multiple times. v2: fix job initialisation order and use correct scheduler instance v3: try to move all functional changes into a separate patch. v4: separate reordering, pass1 and pass2 change v5: fix va_start calculation v6: fix user fence check Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 YiPeng Chai 提交于
V3: Fixed psp fence and memory issues for the asic using smu v13_0_2 when removing amdgpu device. [Why]: 1. psp_suspend->psp_free_shared_bufs-> psp_ta_free_shared_buf-> amdgpu_bo_free_kernel-> ...->amdgpu_bo_release_notify-> amdgpu_fill_buffer psp will free vram memory used by psp when psp_suspend is called. But for the asic using smu v13_0_2, because psp_suspend is called before adev->shutdown is set to true when removing the first hive device, amdgpu fill_buffer will be called, which will cause fence issues when evicting all vram resources in amdgpu vram mgr_fini. 2. Since psp_hw_fini is not called after calling psp_suspend and psp_suspend only calls psp_ring_stop, the psp ring memory will not be released when amdgpu device is removed. [How]: 1. Set shutdown to true before calling amdgpu_device_gpu_recover, then amdgpu_fill_buffer will not be called when psp_suspend is called. 2. Free psp ring memory in psp_sw_fini. Signed-off-by: NYiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 YiPeng Chai 提交于
Adjust removal control flow for smu v13_0_2: During amdgpu uninstallation, when removing the first device, the kernel needs to first send a mode1reset message to all gpu devices. Otherwise, smu initialization will fail the next time amdgpu is installed. V2: 1. Update commit comments. 2. Remove the global variable amdgpu_device_remove_cnt and add a variable to the structure amdgpu_hive_info. 3. Use hive to detect the first removed device instead of a global variable. V3: 1. Update commit comments. 2. Split a patch into multiple patches. 3. The current patch does: a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into the existing gpu recover path, which make all devices in hive list only have HW reset but no resume (except the base IP). b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover in amdgpu_pci_remove when removing the first device in hive list. c. When removing the first device, the IP blocks keyword function call sequence is as follows: .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini. ^ | |-<----------<---------<----| The first three sequences are because of a call to amdgpu_device_gpu_recover. The three sequences will be executed in a loop until all devices in the hive list are iterated. The sequences starting from .hw_fini only apply to the first device. Since .suspend has been called before, except the resumed phase1 basic ip blocks, all other ip blocks .hw_fini of current device will do nothing. d. When removing other devices, the calling sequences is the same as legacy: .hw_fini -> .early_fini -> .sw_fini. Since .suspend has been called when removing the first device, except the resumed phase1 basic ip blocks, all of other ip blocks .hw_fini of current device will do nothing. Signed-off-by: NYiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yifan Zhang 提交于
This patch addes MES and MES-KIQ version in debugfs. Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com> Reviewed-by: NTim Huang <Tim.Huang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yang Li 提交于
clean up some inconsistent indentings Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2178Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Signed-off-by: NYang Li <yang.lee@linux.alibaba.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
amdgpu_firmware_info debugfs will show rlcv/rlcp ucode version info Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
add rlc v2_x support to print_rlc_hdr helper Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hawking Zhang 提交于
cache rlcv/rlcvp ucode version info in amdgpu_gfx structure Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mukul Joshi 提交于
This patch updates the PTE flags when translate further (TF) is enabled: - With translate_further enabled, invalid PTEs can be 0. Reading consecutive invalid PTEs as 0 is considered a fault. To prevent this, ensure invalid PTEs have at least 1 bit set. - The current invalid PTE flags settings to translate a retry fault into a no-retry fault, doesn't work with TF enabled. As a result, update invalid PTE flags settings which works for both TF enabled and disabled case. Fixes: 352e683b ("drm/amdgpu: Enable translate_further to extend UTCL2 reach") Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NMukul Joshi <mukul.joshi@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
SDMA update page table may be called from unlocked context, this generate below warning. Use unlocked iterator to handle this case. WARNING: CPU: 0 PID: 1475 at drivers/dma-buf/dma-resv.c:483 dma_resv_iter_next Call Trace: dma_resv_iter_first+0x43/0xa0 amdgpu_vm_sdma_update+0x69/0x2d0 [amdgpu] amdgpu_vm_ptes_update+0x29c/0x870 [amdgpu] amdgpu_vm_update_range+0x2f6/0x6c0 [amdgpu] svm_range_unmap_from_gpus+0x115/0x300 [amdgpu] svm_range_cpu_invalidate_pagetables+0x510/0x5e0 [amdgpu] __mmu_notifier_invalidate_range_start+0x1d3/0x230 unmap_vmas+0x140/0x150 unmap_region+0xa8/0x110 Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Suggested-by: NFelix Kuehling <felix.kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 9月, 2022 3 次提交
-
-
由 Alex Deucher 提交于
Move common IP init before GMC init so that HDP gets remapped before GMC init which uses it. This fixes the Unsupported Request error reported through AER during driver load. The error happens as a write happens to the remap offset before real remapping is done. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
This mirrors what we do for other asics and this way we are sure the sdma doorbell range is properly initialized. There is a comment about the way doorbells on gfx9 work that requires that they are initialized for other IPs before GFX is initialized. However, the statement says that it applies to multimedia as well, but the VCN code currently initializes doorbells after GFX and there are no known issues there. In my testing at least I don't see any problems on SDMA. This is a prerequisite for fixing the Unsupported Request error reported through AER during driver load. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
This mirrors what we do for other asics and this way we are sure the ih doorbell range is properly initialized. There is a comment about the way doorbells on gfx9 work that requires that they are initialized for other IPs before GFX is initialized. In this case IH is initialized before GFX, so there should be no issue. This is a prerequisite for fixing the Unsupported Request error reported through AER during driver load. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373 The error was unnoticed before and got visible because of the commit referenced below. This doesn't fix anything in the commit below, rather fixes the issue in amdgpu exposed by the commit. The reference is only to associate this commit with below one so that both go together. Fixes: 8795e182 ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()") Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 14 9月, 2022 22 次提交
-
-
由 Alex Deucher 提交于
for imu_v11_0_3_program_rlc_ram(). Include imu_v11_0_3.h in imu_v11_0_3.c. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Sort the functions in the order they are called Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Cleanup the coding style and function names to represent the data they process. Only initialize and cleanup the CS structure in init/fini. Check the size of the IB chunk in pass1. v2: fix job initialisation order and use correct scheduler instance v3: try to move all functional changes into a separate patch. v4: move reordering and pass2 out of this patch as well Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Use DMA_RESV_USAGE_BOOKKEEP for VM page table updates and KFD preemption fence. v2: actually update all usages for KFD Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
This reverts commit 94f4c496. We found that the bo_list is missing a protection for its list entries. Since that is fixed now this workaround can be removed again. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Move setting the job resources into amdgpu_job.c Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
We should not have any different CS constrains based on the execution environment. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Candice Li 提交于
No need to reset error status since only umc ras supported on psp v13_0_0. Signed-off-by: NCandice Li <candice.li@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Candice Li 提交于
Set correct EEPROM I2C address for smu v13_0_0. Signed-off-by: NCandice Li <candice.li@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 John Clements 提交于
copy ras driver to psp if present Signed-off-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Was missing before and would have resulted in a write to a non-existant register. Normally APUs don't use HDP, but other asics could use this code and APUs do use the HDP when used in passthrough. Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jingyu Wang 提交于
Fix everything checkpatch.pl complained about in amdgpu_amdkfd_gpuvm.c Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NJingyu Wang <jingyuwang_vip@163.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jingyu Wang 提交于
Fix everything checkpatch.pl complained about in amdgpu_amdkfd.c Signed-off-by: NJingyu Wang <jingyuwang_vip@163.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jingyu Wang 提交于
This is a patch to the amdgpu_sync.c file that fixes some warnings found by the checkpatch.pl tool Signed-off-by: NJingyu Wang <jingyuwang_vip@163.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jingyu Wang 提交于
Fix everything checkpatch.pl complained about in amdgpu_acpi.c Signed-off-by: NJingyu Wang <jingyuwang_vip@163.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 zhang songyi 提交于
Return the sdma_v6_0_start() directly instead of storing it in another redundant variable. Reported-by: NZeal Robot <zealci@zte.com.cn> Signed-off-by: Nzhang songyi <zhang.songyi@zte.com.cn> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Vignesh Chander 提交于
both get_xgmi_hive and put_xgmi_hive can be skipped since the reset domain is not necessary for VF Signed-off-by: NVignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: NShaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yang Wang 提交于
align TMR BO size TO tmr size is not necessary, modify the size to 1M to avoid re-create BO fail when serious VRAM fragmentation. v2: add new macro PSP_TMR_ALIGNMENT for TMR BO alignment size Signed-off-by: NYang Wang <KevinYang.Wang@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Candice Li 提交于
Enable full reset for RAS supported configuration on gc v11_0_0. v2: simplify the code. Signed-off-by: NCandice Li <candice.li@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Candice Li 提交于
Only check MCUMC_STATUS for CE counter for umc v8_10. Signed-off-by: NCandice Li <candice.li@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
For SRIOV configuration, host driver control the reset method(either FLR or heavier chain reset). The host will notify the guest individually with FLR message if individual GPU within the hive need to be reset. So for guest side, no need to use hive->reset_domain to replace the original per device reset_domain Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Hamza Mahfooz 提交于
Currently, we aren't handling DRM_IOCTL_MODE_DIRTYFB. So, use drm_atomic_helper_dirtyfb() as the dirty callback in the amdgpu_fb_funcs struct. Signed-off-by: NHamza Mahfooz <hamza.mahfooz@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-