- 05 8月, 2020 2 次提交
-
-
由 Alex Deucher 提交于
Rather than open coding it everywhere. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 John Clements 提交于
enabled GECC error injection and query support Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 31 7月, 2020 1 次提交
-
-
由 John Clements 提交于
add support for umc 8.7 initialization add umc 8.7 source to makefile Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 7月, 2020 4 次提交
-
-
由 Huang Rui 提交于
All gc/mmhub register access and operation should be in gfxhub/mmhub level. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
This patch is to move get_invalidate_req into gfxhub/mmhub level. It will avoid mismatch of the different gfxhub/mmhub register offsets and fields in the same gmc block. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
This patch is to introduce vmhub funcs helper to add following callback (print_l2_protection_fault_status). Each GC/MMHUB register specific programming should be in gfxhub/mmhub level. v2: remove the condition of funcs assignment. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
This patch is to add set_vm_fault_masks helper to amdgpu_gmc to refine the original programming. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 7月, 2020 3 次提交
-
-
由 Jiansong Chen 提交于
The athub version used for navy_flounder is v2.1. Signed-off-by: NJiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jiansong Chen 提交于
navy_flounder has similar gc IP version with sienna_cichlid, follow its setting for the moment. Signed-off-by: NJiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: NTao Zhou <Tao.Zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jiansong Chen 提交于
Same as navi10. Signed-off-by: NJiansong Chen <Jiansong.Chen@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 7月, 2020 1 次提交
-
-
由 Huang Rui 提交于
This patch updates to use register distance member instead of hardcode in GMC10. Signed-off-by: NHuang Rui <ray.huang@amd.com> Tested-by: NAnZhong Huang <anzhong.huang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 7月, 2020 1 次提交
-
-
由 Felix Kuehling 提交于
The KFD VMID assignment was hard-coded in a few places. Consolidate that in a single variable adev->vm_manager.first_kfd_vmid. The value is still assigned in gmc-ip-version-specific code. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 7月, 2020 3 次提交
-
-
由 John Clements 提交于
support for setting up XGMI FB address regions Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 shaoyunl 提交于
On SRIOV run time, driver shouldn't directly access invalidation registers through MMIO. Use kiq to submit wait_reg_mem package for the invalidation Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Add gmc clockgating support for sienna_cichlid. The athub version used for sienna_cichlid is v2.1. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 6月, 2020 2 次提交
-
-
由 Likun Gao 提交于
Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Likun Gao 提交于
Same as navi10. Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 5月, 2020 1 次提交
-
-
由 Alan Swanson 提交于
Try to resize BAR0 to let CPU access all of VRAM on Navi. Syncs code with previous gfx generations from commit d6895ad3 ("drm/amdgpu: resize VRAM BAR for CPU access v6"). Signed-off-by: NAlan Swanson <reiver@improbability.net> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 4月, 2020 1 次提交
-
-
由 Christian König 提交于
Fix the coding style, move and rename the definitions to better match what they are supposed to be doing. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 4月, 2020 2 次提交
-
-
由 Colin Ian King 提交于
Currently the error returns paths are unlocking lock kiq->ring_lock however it seems this should be dev->gfx.kiq.ring_lock as this is the lock that is being locked and unlocked around the ring operations. This looks like a bug, but it's not. The kiq is just a local variable pointing to the same structure. Make it consistent. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yintian Tao 提交于
Wait for the oldest sequence on the ring to be signaled in order to make sure there will be no command overrun. v2: fix coding stype and remove abs operation v3: remove the initialization of variable r Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 4月, 2020 1 次提交
-
-
由 Oak Zeng 提交于
UTCL2 client ID is useful information to get which UTCL2 client caused the gpuvm fault. Print it out for debug purpose Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NChristian Konig <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 02 4月, 2020 1 次提交
-
-
由 xinhui pan 提交于
We have three ib pools, they are normal, VM, direct pools. Any jobs which schedule IBs without dependence on gpu scheduler should use DIRECT pool. Any jobs schedule direct VM update IBs should use VM pool. Any other jobs use NORMAL pool. v2: squash in coding style fix Signed-off-by: Nxinhui pan <xinhui.pan@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 1月, 2020 2 次提交
-
-
由 Felix Kuehling 提交于
The flush_type was incorrectly hard-coded to 0 when calling falling back to MMIO-based invalidation in flush_gpu_tlb_pasid. Fixes: ea930000 ("drm/amdgpu: export function to flush TLB via pasid") Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
Use a more meaningful variable name for the invalidation request that is distinct from the tmp variable that gets overwritten when acquiring the invalidation semaphore. Fixes: 4ed8a037 ("drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10") Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 1月, 2020 2 次提交
-
-
由 Tianci.Yin 提交于
This reverts commit 9e441478. The patch will be replaced with a better solution, revert it. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NTianci.Yin <tianci.yin@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Sierra 提交于
[Why] PM4 packet size for flush message was oversized. [How] Packet size adjusted to allocate flush + fence packets. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 1月, 2020 2 次提交
-
-
由 Tianci.Yin 提交于
[why] In dual GPUs scenario, stolen_size is assigned to zero on the secondary GPU, since there is no pre-OS console using that memory. Then the bottom region of VRAM was allocated as GTT, unfortunately a small region of bottom VRAM was encroached by UMC firmware during GDDR6 BIST training, this cause page fault. [how] Forcing stolen_size to 3MB, then the bottom region of VRAM was allocated as stolen memory, GTT corruption avoid. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Signed-off-by: NTianci.Yin <tianci.yin@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Sierra 提交于
This can be used directly from amdgpu and amdkfd to invalidate TLB through pasid. It supports gmc v7, v8, v9 and v10. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 14 1月, 2020 3 次提交
-
-
由 Alex Deucher 提交于
We don't need to store the pre-OS console memory after the driver has loaded so free it. Reviewed-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Leftover from bring up. We look up the actual pre-OS memory usage value later in the same function. Reviewed-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Nirmoy Das 提交于
Do not ignore amdgpu_irq_add_id return value while registering VMC page fault interrupt. Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 1月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
Rather than open coding it. This also changes the free masks to better reflect the usage by other components. Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 12月, 2019 1 次提交
-
-
由 changzhu 提交于
It may fail to load guest driver in round 2 when using invalidate semaphore for SRIOV. So it needs to avoid using invalidate semaphore for SRIOV. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 12月, 2019 1 次提交
-
-
由 Monk Liu 提交于
otherwse the flush_gpu_tlb will hang if we unload the KMD becuase the schedulers already stopped Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 12月, 2019 1 次提交
-
-
由 Monk Liu 提交于
otherwse the flush_gpu_tlb will hang if we unload the KMD becuase the schedulers already stopped Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 11月, 2019 2 次提交
-
-
由 changzhu 提交于
It may lose gpuvm invalidate acknowldege state across power-gating off cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire before invalidation and semaphore release after invalidation. After adding semaphore acquire before invalidation, the semaphore register become read-only if another process try to acquire semaphore. Then it will not be able to release this semaphore. Then it may cause deadlock problem. If this deadlock problem happens, it needs a semaphore firmware fix. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 changzhu 提交于
It may lose gpuvm invalidate acknowldege state across power-gating off cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire before invalidation and semaphore release after invalidation. After adding semaphore acquire before invalidation, the semaphore register become read-only if another process try to acquire semaphore. Then it will not be able to release this semaphore. Then it may cause deadlock problem. If this deadlock problem happens, it needs a semaphore firmware fix. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 11月, 2019 1 次提交
-
-
由 changzhu 提交于
The GRBM register interface is now capable of bursting 1 cycle per register wr->wr, wr->rd much faster than previous muticycle per transaction done interface. This has caused a problem where status registers requiring HW to update have a 1 cycle delay, due to the register update having to go through GRBM. For cp ucode, it has realized dummy read in cp firmware.It covers the use of WAIT_REG_MEM operation 1 case only.So it needs to call gfx_v10_0_wait_reg_mem in gfx10. Besides it also needs to add warning to update firmware in case firmware is too old to have function to realize dummy read in cp firmware. For sdma ucode, it hasn't realized dummy read in sdma firmware. sdma is moved to gfxhub in gfx10. So it needs to add dummy read in driver between amdgpu_ring_emit_wreg and amdgpu_ring_emit_reg_wait for sdma_v5_0. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-