- 05 8月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
Rather than open coding it everywhere. Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 7月, 2020 1 次提交
-
-
由 Huang Rui 提交于
This patch updates to use register distance member instead of hardcode in GMC9. Signed-off-by: NHuang Rui <ray.huang@amd.com> Tested-by: NAnZhong Huang <anzhong.huang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 7月, 2020 2 次提交
-
-
由 Felix Kuehling 提交于
When there is no graphics support, KFD can use more of the VMIDs. Graphics VMIDs are only used for video decoding/encoding and post processing. With two VCE engines, there is no reason to reserve more than 2 VMIDs for that. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
The KFD VMID assignment was hard-coded in a few places. Consolidate that in a single variable adev->vm_manager.first_kfd_vmid. The value is still assigned in gmc-ip-version-specific code. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 5月, 2020 1 次提交
-
-
由 Alex Deucher 提交于
Add some APU flags to simplify handling of different APU variants. It's easier to understand the special cases if we use names flags rather than checking device ids and silicon revisions. v2: rebase on latest code Acked-by: NEvan Quan <evan.quan@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 4月, 2020 2 次提交
-
-
由 Colin Ian King 提交于
Currently the error returns paths are unlocking lock kiq->ring_lock however it seems this should be dev->gfx.kiq.ring_lock as this is the lock that is being locked and unlocked around the ring operations. This looks like a bug, but it's not. The kiq is just a local variable pointing to the same structure. Make it consistent. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yintian Tao 提交于
Wait for the oldest sequence on the ring to be signaled in order to make sure there will be no command overrun. v2: fix coding stype and remove abs operation v3: remove the initialization of variable r Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 09 4月, 2020 1 次提交
-
-
由 Oak Zeng 提交于
UTCL2 client ID is useful information to get which UTCL2 client caused the gpuvm fault. Print it out for debug purpose Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NChristian Konig <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 3月, 2020 1 次提交
-
-
由 Guchun Chen 提交于
RAS support capability needs to be updated on top of different memeory ECC enablement, and remove redundant memory ecc check in gmc module for vega20 and arcturus. v2: check HBM ECC enablement and set ras mask accordingly. v3: avoid to invoke atomfirmware interface to query twice. Suggested-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 3月, 2020 1 次提交
-
-
由 Hawking Zhang 提交于
MMHUB ras error counters are dirty ones after cold reboot Read operation is needed to reset them to 0 Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 2月, 2020 3 次提交
-
-
由 Shirish S 提交于
fixes S3 issue with IOMMU + S/G enabled @ 64M VRAM. Suggested-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NShirish S <shirish.s@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Felix Kuehling 提交于
Using a heavy-weight TLB flush once is not sufficient. Concurrent memory accesses in the same TLB cache line can re-populate TLB entries from stale texture cache (TC) entries while the heavy-weight TLB flush is in progress. To fix this race condition, perform another TLB flush after the heavy-weight one, when TC is known to be clean. Move the workaround into the low-level TLB flushing functions. This way they apply to amdgpu as well, and KIQ-based TLB flush only needs to synchronize once. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: Nshaoyun liu <shaoyun.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Shirish S 提交于
fixes S3 issue with IOMMU + S/G enabled @ 64M VRAM. Suggested-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NShirish S <shirish.s@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 1月, 2020 2 次提交
-
-
由 Felix Kuehling 提交于
The flush_type was incorrectly hard-coded to 0 when calling falling back to MMIO-based invalidation in flush_gpu_tlb_pasid. Fixes: ea930000 ("drm/amdgpu: export function to flush TLB via pasid") Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
Use a more meaningful variable name for the invalidation request that is distinct from the tmp variable that gets overwritten when acquiring the invalidation semaphore. Fixes: 4ed8a037 ("drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10") Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 1月, 2020 1 次提交
-
-
由 Alex Sierra 提交于
[Why] PM4 packet size for flush message was oversized. [How] Packet size adjusted to allocate flush + fence packets. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 1月, 2020 1 次提交
-
-
由 Alex Sierra 提交于
This can be used directly from amdgpu and amdkfd to invalidate TLB through pasid. It supports gmc v7, v8, v9 and v10. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 14 1月, 2020 1 次提交
-
-
由 Joseph Greathouse 提交于
The only data fabric information the adev struct currently contains is a function pointer table. In the near future, we will be adding some cached DF information into adev. As such, this patch creates a new amdgpu_df struct for adev. Right now, it only containst the old function pointer table, but new stuff will be added soon. Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 1月, 2020 4 次提交
-
-
由 Alex Deucher 提交于
So it can be shared with newer GMC versions. Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Zhigang Luo 提交于
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com> Signed-off-by: NJane Jian <jane.jian@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Zhigang Luo 提交于
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com> Signed-off-by: NJane Jian <jane.jian@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Zhigang Luo 提交于
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com> Signed-off-by: NJane Jian <jane.jian@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 12月, 2019 1 次提交
-
-
由 John Clements 提交于
Devices newer then VEGA10/12 shall have these programming sequences performed by PSP BL Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 12月, 2019 1 次提交
-
-
由 changzhu 提交于
It may fail to load guest driver in round 2 or cause Xstart problem when using invalidate semaphore for SRIOV or picasso. So it needs avoid using invalidate semaphore for SRIOV and picasso. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 12 12月, 2019 3 次提交
-
-
由 changzhu 提交于
It may cause timeout waiting for sem acquire in VM flush when using invalidate semaphore for picasso. So it needs to avoid using invalidate semaphore for piasso. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 John Clements 提交于
Updated UMC 6.1 function set to support UMC 6.1.1 and 6.1.2 devices Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 changzhu 提交于
It may cause timeout waiting for sem acquire in VM flush when using invalidate semaphore for picasso. So it needs to avoid using invalidate semaphore for piasso. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 11月, 2019 3 次提交
-
-
由 changzhu 提交于
It may lose gpuvm invalidate acknowldege state across power-gating off cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire before invalidation and semaphore release after invalidation. After adding semaphore acquire before invalidation, the semaphore register become read-only if another process try to acquire semaphore. Then it will not be able to release this semaphore. Then it may cause deadlock problem. If this deadlock problem happens, it needs a semaphore firmware fix. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 changzhu 提交于
It may lose gpuvm invalidate acknowldege state across power-gating off cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire before invalidation and semaphore release after invalidation. After adding semaphore acquire before invalidation, the semaphore register become read-only if another process try to acquire semaphore. Then it will not be able to release this semaphore. Then it may cause deadlock problem. If this deadlock problem happens, it needs a semaphore firmware fix. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Dennis Li 提交于
Get mmhub error counter by accessing EDC_CNT registers. v2: Add mmhub_v9_4_ prefix for local static variable and function Signed-off-by: NDennis Li <dennis.li@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 11月, 2019 1 次提交
-
-
由 Hawking Zhang 提交于
reuse vg20 umc functions for arcturus umc ras Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NJohn Clements <John.Clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 10月, 2019 2 次提交
-
-
由 Oak Zeng 提交于
This allows gfx cache to be probed and invalidated (for none-dirty cache lines) on a HDP write (from either another GPU or CPU). This should work only for the memory mapped as RW memory type newly added for arcturus, to achieve some cache coherence b/t multiple memory clients. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NChristian Konig <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
Many logic in this function are HDP set up, not gart set up. Moved those logic to gmc_v9_0_hw_init. No functional change. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Acked-by: NChristian konig <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 10月, 2019 1 次提交
-
-
由 Ori Messinger 提交于
The vram vendor can be found as a separate sysfs file at: /sys/class/drm/card[X]/device/mem_info_vram_vendor The vram vendor is displayed as a string value. v2: Use correct bit masking, and cache vram_vendor in gmc v3: Drop unused functions for vram width, type, and vendor Signed-off-by: NOri Messinger <ori.messinger@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 10月, 2019 5 次提交
-
-
由 Jack Zhang 提交于
Add ip block setting for Arcturus SRIOV 1.PSP need to be initialized before IH. 2.SMU doesn't need to be initialized at kmd driver. 3.Arcturus doesn't support DCE hardware,it needs to skip register access to DCE. Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com> Reviewed-by: NLe Ma <Le.Ma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
common gmc_ecc_late_init can be shared among all generations of gmc v2: rename gmc_ecc_late_init to gmc_ras_late_init Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
umc_ras_late_init can get the info by itself Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
gmc_ras_fini can be shared among all generations of gmc Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
mmhub_ras_if is relevant to mmhub Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-