- 17 12月, 2021 1 次提交
-
-
由 Victor Skvortsov 提交于
We want to be able to call virt data exchange conditionally after gmc sw init to reserve bad pages as early as possible. Since this is a conditional call, we will need to call it again unconditionally later in the init sequence. Refactor the data exchange function so it can be called multiple times without re-initializing the work item. v2: Cleaned up the code. Kept the original call to init_exchange_data() inside early init to initialize the work item, afterwards call exchange_data() when needed. Signed-off-by: NVictor Skvortsov <victor.skvortsov@amd.com> Reviewed By: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 5月, 2021 1 次提交
-
-
由 Bokun Zhang 提交于
- Update SRIOV PF2VF header with latest revision - Extend existing function in amdgpu_virt.c to read MM bandwidth config from PF2VF message - Add SRIOV Sienna Cichlid codec array and update the bandwidth with PF2VF message v2: squash in removal of unused variable (Alex) Signed-off-by: NBokun Zhang <bokun.zhang@amd.com> Reviewed-by: NMonk liu <monk.liu@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 4月, 2021 3 次提交
-
-
由 Peng Ju Zhou 提交于
using the control bits got from host to control registers access. Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: NEmily.Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Peng Ju Zhou 提交于
unify host driver and guest driver indirect access control bits names Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: NEmily.Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Rohit Khaire 提交于
Add 3 sub flags to notify guest for indirect reg access of gc, mmhub and ih The host sets these flags depending on L1 RAP version, asic and other scenarios. These flags ensure that there is compatibility between different guest/host/vbios versions. Signed-off-by: NRohit Khaire <rohit.khaire@amd.com> Reviewed-by: NMonk Liu <monk.liu@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 9月, 2020 2 次提交
-
-
由 Bokun Zhang 提交于
- Refactor the driver code to use amdgpu_virt_read_pf2vf_data and amdgpu_virt_write_vf2pf_data instead of writing all code in one function (which is the old amdgpu_virt_init_data_exchange) - Adding a new transaction method for VF2PF message between host and guest driver. Guest side will periodically update VF2PF message in the framebuffer. In the new header, we include guest ucode information, guest framebuffer usage, and engine usage - Clean up the old macros since they will cause compile error if the new transaction method is used v2: squash in build fix Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com> Reviewed-by: NMonk Liu <monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Bokun Zhang 提交于
- Update guest side VF2PF interface header file Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com> Reviewed-by: NMonk Liu <monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 8月, 2020 1 次提交
-
-
由 Dennis Li 提交于
if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 8月, 2020 1 次提交
-
-
由 Christian König 提交于
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1a. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 7月, 2020 1 次提交
-
-
由 Stanley.Yang 提交于
v1: rename some functions name, only init ras error handler data for supported asic. v2: fix potential memory leak. Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 5月, 2020 1 次提交
-
-
由 Kevin Wang 提交于
the swsmu or powerplay(hwmgr) need to handle task according to different VF mode, this function to help query vf mode. vf mode: 1. SRIOV_VF_MODE_BARE_METAL: the driver work on host OS (PF) 2. SRIOV_VF_MODE_ONE_VF : the driver work on guest OS with one VF 3. SRIOV_VF_MODE_MULTI_VF : the driver work on guest OS with multi VF Signed-off-by: NKevin Wang <kevin1.wang@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 14 4月, 2020 2 次提交
-
-
由 Yintian Tao 提交于
If there is no GPU hang, user still can access debugfs through kiq. Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NMonk Liu <Monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yintian Tao 提交于
Under bare metal, there is no more else to take care of the GPU register access through MMIO. Under Virtualization, to access GPU register is implemented through KIQ during run-time due to world-switch. Therefore, under SR-IOV user can only access debugfs to r/w GPU registers when meets all three conditions below. - amdgpu_gpu_recovery=0 - TDR happened - in_gpu_reset=0 v2: merge amdgpu_virt_can_access_debugfs() into amdgpu_virt_enable_access_debugfs() v3: drop ret variable in amdgpu_virt_enable_access_debugfs() and directly return result Signed-off-by: NYintian Tao <yttao@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 02 4月, 2020 3 次提交
-
-
由 Monk Liu 提交于
1) modify xgpu_nv_send_access_requests to support new idh request 2) introduce new function: req_gpu_init_data() which is used to notify host to prepare vbios/ip-discovery/pfvf exchange Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Monk Liu 提交于
we need to move virt detection much earlier because: 1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always be at DE5 (dw) mmio offset from vega10, this way there is no need to implement detect_hw_virt() routine in each nbio/chip file. for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at 0x1503 2) we need to acknowledged we are SRIOV VF before we do IP discovery because the IP discovery content will be updated by host everytime after it recieved a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches for this new handshake soon). Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Monk Liu 提交于
Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 3月, 2020 1 次提交
-
-
由 Monk Liu 提交于
what changed: 1)provide new implementation interface for the rlcg access path 2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op function can access reg that need RLCG path help now even debugfs's reg_op can used to dump wave. tested-by: NMonk Liu <monk.liu@amd.com> tested-by: NZhou pengju <pengju.zhou@amd.com> Signed-off-by: NZhou pengju <pengju.zhou@amd.com> Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 1月, 2020 1 次提交
-
-
由 chen gong 提交于
Move amdgpu_virt_kiq_rreg/amdgpu_virt_kiq_wreg function to amdgpu_gfx.c, and rename them to amdgpu_kiq_rreg/amdgpu_kiq_wreg.Make it generic and flexible. Signed-off-by: Nchen gong <curry.gong@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 12 12月, 2019 1 次提交
-
-
由 Yintian Tao 提交于
Originally, due to the restriction from PSP and SMU, VF has to send message to hypervisor driver to handle powerplay change which is complicated and redundant. Currently, SMU and PSP can support VF to directly handle powerplay change by itself. Therefore, the old code about the handshake between VF and PF to handle powerplay will be removed and VF will use new the registers below to handshake with SMU. mmMP1_SMN_C2PMSG_101: register to handle SMU message mmMP1_SMN_C2PMSG_102: register to handle SMU parameter mmMP1_SMN_C2PMSG_103: register to handle SMU response v2: remove module parameter pp_one_vf v3: fix the parens v4: forbid vf to change smu feature v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute v6: change skip condition at vega10_copy_table_to_smc Signed-off-by: NYintian Tao <yttao@amd.com> Acked-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NKenneth Feng <kenneth.feng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 02 8月, 2019 1 次提交
-
-
由 Monk Liu 提交于
we can simplify all those unnecessary function under SRIOV for vega10 since: 1) PSP L1 policy is by force enabled in SRIOV 2) original logic always set all flags which make itself a dummy step besides, 1) the ih_doorbell_range set should also be skipped for VEGA10 SRIOV. 2) the gfx_common registers should also be skipped for VEGA10 SRIOV. Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 22 6月, 2019 1 次提交
-
-
由 Jack Xiao 提交于
For new submission ib, CE/DE metadata should be programmed to 0; for partially execution ib, CE/DE metadata should be restored. Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJack Xiao <Jack.Xiao@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 5月, 2019 1 次提交
-
-
由 Trigger Huang 提交于
Set different register access mode according to the features provided by firmware Signed-off-by: NTrigger Huang <Trigger.Huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 4月, 2019 1 次提交
-
-
由 Yintian Tao 提交于
Under vega10 virtualuzation, smu ip block will not be added. Therefore, we need add pp clk query and force dpm level function at amdgpu_virt_ops to support the feature. v2: add get_pp_clk existence check and use kzalloc to allocate buf v3: return -ENOMEM for allocation failure and correct the coding style Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 11月, 2018 1 次提交
-
-
由 Emily Deng 提交于
Aligned the amd_sriov_msg_pf2vf_info_header and amd_sriov_msg_pf2vf_info_header's definition with libgv. Signed-off-by: NEmily Deng <Emily.Deng@amd.com> Reviewed-by: NFrank.Min <Frank.Min@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 11月, 2018 4 次提交
-
-
由 Christian König 提交于
Move the kiq handling into amdgpu_virt.c and drop the fallback. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Rex Zhu 提交于
In baremetal, also need to reserve csa for preemption. so move the csa related code out of sriov. Reviewed-by: NMonk Liu <Monk.Liu@amd.com> Signed-off-by: NRex Zhu <Rex.Zhu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Rex Zhu 提交于
There is no functional changes, Use function arguments for SRIOV special variables which is hardcode in those functions. so we can share those functions in baremetal. Reviewed-by: NMonk Liu <Monk.Liu@amd.com> Signed-off-by: NRex Zhu <Rex.Zhu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Rex Zhu 提交于
driver didn't use this address so far. Reviewed-by: NMonk Liu <Monk.Liu@amd.com> Signed-off-by: NRex Zhu <Rex.Zhu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 2月, 2018 1 次提交
-
-
由 Christian König 提交于
Move the CSA area to the top of the VA space to avoid clashing with HMM/ATC in the lower range on GFX9. v2: wrong sign noticed by Roger, rebase on CSA_VADDR cleanup, handle VA hole on GFX9 as well. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NMonk Liu <monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 12月, 2017 1 次提交
-
-
由 Monk Liu 提交于
instead of doing it in each GFX ip's sw_fini Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 12月, 2017 5 次提交
-
-
由 Monk Liu 提交于
Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Monk Liu 提交于
since now gpu reset is unified with gpu_recover for both bare-metal and SR-IOV: 1)rename in_sriov_reset to in_gpu_reset 2)move lock_reset from adev->virt to adev Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Monk Liu 提交于
1,new imple names amdgpu_gpu_recover which gives more hint on what it does compared with gpu_reset 2,gpu_recover unify bare-metal and SR-IOV, only the asic reset part is implemented differently 3,gpu_recover will increase hang job karma and mark its entity/context as guilty if exceeds limit V2: 4,in scheduler main routine the job from guilty context will be immedialy fake signaled after it poped from queue and its fence be set with "-ECANCELED" error 5,in scheduler recovery routine all jobs from the guilty entity would be dropped 6,in run_job() routine the real IB submission would be skipped if @skip parameter equales true or there was VRAM lost occured. V3: 7,replace deprecated gpu reset, use new gpu recover Signed-off-by: NMonk Liu <Monk.Liu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 pding 提交于
Driver can use this interface to check if there's a function level reset done in hypervisor. It's helpful when IRQ handler for reset is not ready, or special handling is required. Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NMonk Liu <monk.liu@amd.com> Signed-off-by: Npding <Pixel.Ding@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 pding 提交于
MMIO space can be blocked on virtualised device. Add this function to check if MMIO is blocked or not. Todo: need a reliable method such like communation with hypervisor. v2: - add comments inline Signed-off-by: Npding <Pixel.Ding@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 10月, 2017 1 次提交
-
-
由 Horace Chen 提交于
SR-IOV need to exchange some data between PF&VF through shared VRAM PF will copy some necessary firmware and information to the shared VRAM. It also requires some information from VF. PF will send a key through mailbox2 to help guest calculate checksum so that it can verify whether the data is correct. So check the data on the specified offset of the shared VRAM, if the checksum is right, read values from it and write some VF information next to the data from PF. Signed-off-by: NHorace Chen <horace.chen@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 9月, 2017 1 次提交
-
-
由 Alex Deucher 提交于
The error handling for virtual functions assumed a single vf per VM and didn't properly account for bare metal. Make the error arrays per device and add locking. Reviewed-by: NGavin Wan <gavin.wan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 8月, 2017 1 次提交
-
-
由 Christian König 提交于
Move the CSA bo_va from the VM to the fpriv structure. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 14 7月, 2017 1 次提交
-
-
由 Gavin Wan 提交于
This feature works for SRIOV enviroment. For non-SRIOV enviroment, the trans_error function does nothing. The error information includes error_code (16bit), error_flags(16bit) and error_data(64bit). Since there are not many errors, we keep the errors in an array and transfer all errors to Host before amdgpu initialization function (amdgpu_device_init) exit. Signed-off-by: NGavin Wan <Gavin.Wan@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-