- 24 3月, 2021 4 次提交
-
-
由 Eric Huang 提交于
To support new cache coherence HW on A+A platform mainly in KFD. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
Aldebaran should be the same as Arcturus in the PTE SNOOPED bit handling. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Rajneesh Bhardwaj 提交于
This applies to AMD Accelerated Processing Platforms that support host gpu interconnect throguh a special link (xgmi). Aldebaran systems will support this special feature for utilizing the benefits of host-gpu cache coherence. This change outlines the basic framework for mapping the GPU VRAM (HBM) to system address space making it accesible to the host but managed by the amdgpu driver since this region is marked as reserved memory in host address space by the underlying system firmware. v2: switch to smuio callback function to check the type of host-gpu interface (Hawking) v3: use hub callbacks rather than direct function calls (Alex) Reviewed-by: NOak Zeng <oak.zeng@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
This gives more information and improves productivity. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 3月, 2021 1 次提交
-
-
由 Le Ma 提交于
Add gfx memory controller support Signed-off-by: NLe Ma <le.ma@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Acked-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 2月, 2021 1 次提交
-
-
由 Alex Deucher 提交于
The hw interface changed on arcturus so the old numbering scheme doesn't work. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 1月, 2021 2 次提交
-
-
由 Likun Gao 提交于
Remove hdp_flush function from amdgpu_nbio struct as it have been unified into hdp struct. Remove the include about hdp register which was not used. V2: Remove hdp golden setting which is unnecessary. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Switch to use the HDP functions which unified on hdp structure instead of the scattered hdp callback functions. V2: clean up hdp reset ras error count function. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 12月, 2020 1 次提交
-
-
由 Hawking Zhang 提交于
The number of crtc should be 0 for ASICs that don't have display engine. Remove the unnecessary asic type check then. Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 12月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
No longer used. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 11月, 2020 5 次提交
-
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:382:23: warning: ‘ecc_umc_mcumc_status_addrs’ defined but not used [-Wunused-const-variable=] drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:720: warning: Function parameter or member 'vmhub' not described in 'gmc_v9_0_flush_gpu_tlb' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:836: warning: Function parameter or member 'flush_type' not described in 'gmc_v9_0_flush_gpu_tlb_pasid' drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:836: warning: Function parameter or member 'all_hub' not described in 'gmc_v9_0_flush_gpu_tlb_pasid' Acked-by: NChristian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c:382:23: warning: ‘ecc_umc_mcumc_status_addrs’ defined but not used [-Wunused-const-variable=] Acked-by: NChristian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
Looks like we can't enabled the IH1/IH2 feature for Vega20, make sure retry faults are handled on a separate ring anyway. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
The address space is only 48bit, not 64bit. And the VMHUBs work with sign extended addresses. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Gustavo A. R. Silva 提交于
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding multiple break statements instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 30 10月, 2020 1 次提交
-
-
由 Christian König 提交于
First of all don't snprintf into a char buffer allocated on the stack with a constant hubname. Then cleanup to exit the function early in case of a ratelimit or SRIOV. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 10月, 2020 1 次提交
-
-
由 Kevin Wang 提交于
remove duplicate gfxhub v1.1 function set. put function of gfxhub_v1_1_get_xgmi_info to gfxhub v1_0 function set. Signed-off-by: NKevin Wang <kevin1.wang@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 10月, 2020 2 次提交
-
-
由 Guchun Chen 提交于
The same ECC check has been executed in amdgpu_ras_init for vega10, prior to gmc_v9_0_late_init. v2: drop all atombios helper callings v3: use bit operation v4: correct inline comment, remove parity check statement v5: squash in build fix Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
gfxhub functions are now called from function pointers, instead of from asic-specific functions. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 9月, 2020 1 次提交
-
-
由 Liu Shixin 提交于
Simplify the return expression. Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 9月, 2020 1 次提交
-
-
由 Shirish S 提交于
With IOMMU enabled, if SDPIF_MMIO_CNTRL_0 is not set appropriately the system hangs without any trace during S3. To ease debug and to ensure that the failure, if any, was caused by a race conditions that disabled write access to SDPIF_MMIO_CNTRL_0 register, warn the user about it. Signed-off-by: NShirish S <shirish.s@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 9月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
Copy paste typo. Reported-by: Nkernel test robot <lkp@intel.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 9月, 2020 3 次提交
-
-
由 Alex Deucher 提交于
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Reviewed-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Print the name of the client rather than the number. This makes it easier to debug what block is causing the fault. Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dennis Li 提交于
When GPU is in reset, its status isn't stable and ring buffer also need be reset when resuming. Therefore driver should protect GPU recovery thread from ring buffer accessed by other threads. Otherwise GPU will randomly hang during recovery. v2: correct indent Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 8月, 2020 2 次提交
-
-
由 Alex Deucher 提交于
We need to restore some registers prior to running asic init to work around a firmware bug. Acked-by: NNirmoy Das <nirmoy.das@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Properly define this register using a relative offset rather than an absolute offset and use the proper SOC15 macros to access it. It's also DCN, not DCE, so remove it from the DCE12 header. No functional change. Acked-by: NNirmoy Das <nirmoy.das@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 8月, 2020 2 次提交
-
-
由 Dennis Li 提交于
Using dev_xxx instead of DRM_xxx/pr_xxx to indicate which device of a hive is the message for. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dennis Li 提交于
if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 15 8月, 2020 2 次提交
-
-
由 Christian König 提交于
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1a. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
Add more function pointers to amdgpu_mmhub_funcs. ASIC specific implementation of most mmhub functions are called from a general function pointer, instead of calling different function for different ASIC. Simplify the code by deleting duplicate functions Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 8月, 2020 4 次提交
-
-
由 Alex Deucher 提交于
The new helper centralizes the logic in one place. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Rather than leaving this as a gmc v9 specific hack. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Since that is where we store the other data related to the stolen vga memory. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Rather than open coding it everywhere. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 7月, 2020 1 次提交
-
-
由 Huang Rui 提交于
This patch updates to use register distance member instead of hardcode in GMC9. Signed-off-by: NHuang Rui <ray.huang@amd.com> Tested-by: NAnZhong Huang <anzhong.huang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 7月, 2020 2 次提交
-
-
由 Felix Kuehling 提交于
When there is no graphics support, KFD can use more of the VMIDs. Graphics VMIDs are only used for video decoding/encoding and post processing. With two VCE engines, there is no reason to reserve more than 2 VMIDs for that. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
The KFD VMID assignment was hard-coded in a few places. Consolidate that in a single variable adev->vm_manager.first_kfd_vmid. The value is still assigned in gmc-ip-version-specific code. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-