- 05 11月, 2020 1 次提交
-
-
由 Ramesh Errabolu 提交于
drm/amd/amdgpu: Enable arcturus devices to access the method kgd_gfx_v9_get_cu_occupancy that is already defined [Why] Allow user to know number of compute units (CU) that are in use at any given moment. [How] Remove the keyword static for the method kgd_gfx_v9_get_cu_occupancy Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 01 10月, 2020 2 次提交
-
-
由 Ramesh Errabolu 提交于
waves that are in flight. [Why] Allow user to know how many compute units (CU) are in use at any given moment. [How] Read registers of SQ that give number of waves that are in flight of various queues. Use this information to determine number of CU's in use. Signed-off-by: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-By: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
gfxhub functions are now called from function pointers, instead of from asic-specific functions. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 8月, 2020 2 次提交
-
-
由 Felix Kuehling 提交于
No need to use a function pointer because the implementation is not ASIC-specific. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
No need to use a function pointer because the implementation is not ASIC-specific. This fixes missing support due to a missing function pointer on Arcturus. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 8月, 2020 1 次提交
-
-
由 Dennis Li 提交于
if other threads have holden the reset lock, recovery will fail to try_lock. Therefore we introduce atomic hive->in_reset and adev->in_gpu_reset, to avoid reentering GPU recovery. v2: drop "? true : false" in the definition of amdgpu_in_reset Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 8月, 2020 1 次提交
-
-
由 Huang Rui 提交于
Renoir only has one sdma instance, it will get failed once query the sdma1 registers. So use switch-case instead of static register array. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 8月, 2020 1 次提交
-
-
由 Huang Rui 提交于
Renoir only has one sdma instance, it will get failed once query the sdma1 registers. So use switch-case instead of static register array. Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 8月, 2020 2 次提交
-
-
由 Christian König 提交于
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1a. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
Add more function pointers to amdgpu_mmhub_funcs. ASIC specific implementation of most mmhub functions are called from a general function pointer, instead of calling different function for different ASIC. Simplify the code by deleting duplicate functions Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2020 1 次提交
-
-
由 Dennis Li 提交于
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com> Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com> Suggested-by: NLuben Tukov <luben.tuikov@amd.com> Signed-off-by: NDennis Li <Dennis.Li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 6月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Patch series "improve use_mm / unuse_mm", v2. This series improves the use_mm / unuse_mm interface by better documenting the assumptions, and my taking the set_fs manipulations spread over the callers into the core API. This patch (of 3): Use the proper API instead. Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de These helpers are only for use with kernel threads, and I will tie them more into the kthread infrastructure going forward. Also move the prototypes to kthread.h - mmu_context.h was a little weird to start with as it otherwise contains very low-level MM bits. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJens Axboe <axboe@kernel.dk> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felipe Balbi <balbi@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Zhenyu Wang <zhenyuw@linux.intel.com> Cc: Zhi Wang <zhi.a.wang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 4月, 2020 2 次提交
-
-
由 Jack Zhang 提交于
This reverts commit 5161bba4311f in order to split it into two different patches, and this will make it easier to understand. [PATCH 1/2] porting to gfx10 from commit 1b0bfcff ("drm/amdgpu: Avoid destroy hqd when GPU is on reset") Originally, MEC is touched without GPU initialized first. Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com> Reviewed-by: NMonk Liu <monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jack Zhang 提交于
kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. v2:add a bugfix for kiq ring test fail Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com> Reviewed-by: NMonk Liu <monk.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 2月, 2020 1 次提交
-
-
由 Yong Zhao 提交于
Given we can query all the asic specific information from amdgpu_gfx_config, we can make get_tile_config() generic. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 2月, 2020 1 次提交
-
-
由 Divya Shikre 提交于
Devices from Arcturus onwards will have their UUID exposed to Thunk. Adding neccessary functions to the kernel to propagate the uuid. Signed-off-by: NDivya Shikre <DivyaUday.Shikre@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 1月, 2020 2 次提交
-
-
由 Aaron Liu 提交于
There is an issue that CP will check the HIQ queue to be configured and mapped with KIQ ring, otherwise, it will be unable to read back the secure buffer while the gfxoff is enabled even with trusted IP blocks. v1 -> v2: - Fix to remove surplus set_resources packets. - Fill the whole configuration in MQD. - Change the author as Aaron because he addressed the key point of this issue. - Add kiq ring lock. v2 -> v3: - Free the lock while in error return case. - Remove the programming only needed by the queue is unmapped. v3 -> v4: - Remove doorbell programming because it's used for restarting queue. - Remove CP scheduler programming because map_queue packet will handle this. v4 -> v5: - Remove cp_hqd_active because mec ucode will enable it while use map_queues. - Revise goto out_unlock. - Correct the right doorbell offset for HIQ that kfd driver assigned in the packet. v5 -> v6: - Merge Arcturus fix into this patch because it will get oops in Arcturus platform. Reported-by: NLisa Saturday <Lisa.Saturday@amd.com> Signed-off-by: NAaron Liu <aaron.liu@amd.com> Signed-off-by: NHuang Rui <ray.huang@amd.com> Reviewed-and-Tested-by: NAaron Liu <aaron.liu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Sierra 提交于
[Why] kfd2kgd interface will be deprecated. This removal only covers TLB invalidation for now. They have been replaced in amdgpu_amdkfd API. [How] TLB invalidate functions removed from the different amdkfd_gfx_v* versions. Signed-off-by: NAlex Sierra <alex.sierra@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 12月, 2019 1 次提交
-
-
由 Yong Zhao 提交于
Since Arcturus has it own function pointer, we can move Arcturus specific logic to there rather than leaving it entangled with other GFX9 chips. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 04 12月, 2019 1 次提交
-
-
由 Yong Zhao 提交于
Adjust the exposed function prototype so that the caller does not need to know the MMHUB number. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 30 10月, 2019 1 次提交
-
-
由 Yong Zhao 提交于
Given amdkfd.ko has been merged into amdgpu.ko, this switch is no longer useful. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 10月, 2019 8 次提交
-
-
由 Yong Zhao 提交于
This is the same idea as the kfd device info probe and move all the probe control together for easy maintenance. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
Those header file includes are not needed. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
get_atc_vmid_pasid_mapping_valid() is very similar to get_atc_vmid_pasid_mapping_pasid(), so they can be merged into a new function get_atc_vmid_pasid_mapping_info() to reduce register access times. More importantly, getting the PASID and the valid bit atomically with a single read fixes some potential race conditions where the mapping changes between the two reads. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
They are not used anywhere. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
The old name is prone to confusion. The register offset is for a RLC queue rather than a SDMA engine. The value is not a base address, but a register offset. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
HW folks have confirm that we should not touch RESUME_CTX of SDMA*_GFX_CONTEXT_CNTL when manipulating RLC queues. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
Currently this function pointer is missing for GFX10. Considering it is a void function since GFX9, fix it by checking the function pointer before dereferencing it. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
The message will be useful when troubleshooting the issues. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 8月, 2019 1 次提交
-
-
由 Oak Zeng 提交于
This is for kfd to reuse amdgpu TLB invalidation function. On gfx10, kfd only needs to flush TLB on gfx hub but not on mm hub. So export a function for KFD flush TLB only on specific hub. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NChristian Konig <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 7月, 2019 2 次提交
-
-
由 Oak Zeng 提交于
Arcturus shares most of the kfd2kgd_calls with gfx9. But due to SDMA register address change, it can't share SDMA related functions. Export gfx9 kfd2kgd_calls and implement SDMA related functions for Arcturus. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
2 mmhubs on arcturus. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 6月, 2019 1 次提交
-
-
由 Sam Ravnborg 提交于
Drop use of drmP.h in all files named amdgpu* in drm/amd/amdgpu/ Fix fallout. Signed-off-by: NSam Ravnborg <sam@ravnborg.org> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org
-
- 25 5月, 2019 2 次提交
-
-
由 shaoyunl 提交于
There is a bug found in vml2 xgmi logic: mtype is always sent as NC on the VMC to TC interface for a page walk, regardless of whether the request is being sent to local or remote GPU. NC means non-coherent and will cause the VMC return data to be cached in the TCC (versus UC – uncached will not cache the data). Since the page table updates are being done by SDMA/HDP, then TCC will never be updated and the GC VML2 will continue to hit on the TCC and never get the updated page tables and result in a fault. Heave weigh tlb invalidation does a WB/INVAL of the L1/L2 GL data caches so TCC will not be hit on next request Signed-off-by: Nshaoyunl <Shaoyun.Liu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Trigger Huang 提交于
Under Vega10 SR-IOV, with new RLC's new feature, VF should call RLC to program some registers if supported Signed-off-by: NTrigger Huang <Trigger.Huang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 4月, 2019 1 次提交
-
-
由 Amber Lin 提交于
Method of getting firmware version is the same across ASICs, so remove them from ASIC-specific files and create one in amdgpu_amdkfd.c. This new created get_fw_version simply reads fw_version from adev->gfx than parsing the ucode header. Signed-off-by: NAmber Lin <Amber.Lin@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 12月, 2018 1 次提交
-
-
由 Kuehling, Felix 提交于
Avoid including mmu_context.h in amdgpu_amdkfd.h since that may be included in other header files that define traces. This leads to conflicts due to traces defined in other headers included via mmu_context.h. Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 11月, 2018 3 次提交
-
-
由 Alex Deucher 提交于
Use the appropriate mmhub and gfxhub headers rather than adding them to the gmc9 header. Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Start using drm_gpu_scheduler.ready isntead. v3: Add helper function to run ring test and set sched.ready flag status accordingly, clean explicit sched.ready sets from the IP specific files. v4: Add kerneldoc and rebase. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yong Zhao 提交于
As part of the change, we stop taking the srbm lock, and start to use the same invalidation engine and software lock as amdgpu. Signed-off-by: NYong Zhao <Yong.Zhao@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-