- 17 12月, 2021 1 次提交
-
-
由 Christian König 提交于
bo->tbo.resource can now be NULL. Signed-off-by: NChristian König <christian.koenig@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1811Acked-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211210083927.1754-1-christian.koenig@amd.com
-
- 11 11月, 2021 2 次提交
-
-
由 Guchun Chen 提交于
Fixes: 96b8dd44 ("drm/amdgpu/amdgpu_vcn: convert to IP version checking") Signed-off-by: NFlora Cui <flora.cui@amd.com> Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Guchun Chen 提交于
Fixes: b05b9c59 ("drm/amdgpu: clean up set IP function") Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 10 11月, 2021 2 次提交
-
-
由 shaoyunl 提交于
The KFD pre_reset should be called before reset been executed, it will hold the lock to prevent other rocm process to sent the packlage to hiq during host execute the real reset on the HW Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
There was a change(below) target for such issue: d82e2c24 ("drm/amdgpu: Fix crash on device remove/driver unload") But the fix for VI ASICs was missing there. This is a supplement for that. Fixes: d82e2c24 ("drm/amdgpu: Fix crash on device remove/driver unload") Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 11月, 2021 7 次提交
-
-
由 Alex Deucher 提交于
Properly handle SI DC support when CONFIG_DRM_AMD_DC_SI is not set. Fixes: f7f12b25 ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Reviewed-by: NEvan Quan <evan.quan@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Felix Kuehling 提交于
If a kfd_bo was shared (e.g. a dmabuf export), the original kfd_bo may be freed when the amdgpu_bo still lives on. Free the kfd_bo struct in the release_notify callback then the amdgpu_bo is freed. Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-By: NRamesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Evan Quan 提交于
As part of the ib padding process, accessing the RLC_SPM_* register may trigger gfx hang. Since gfxoff may be already kicked during the whole period. To address that, we manually toggle gfx on/off around the RLC_SPM_* register access. This can resolve the gfx hang issue observed on running Talos with RDP launched in parallel. Signed-off-by: NEvan Quan <evan.quan@amd.com> Acked-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
The error count reset for xgmi3x16 pcs is missed. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 YuBiao Wang 提交于
[Why] csb bo is not unpinned in gfx 9. It will lead to pin_count leak on driver unload. [How] Call bo_free_kernel corresponding to bo_create_kernel in gfx_rlc_init_csb. This will also unify the code path with other gfx versions. Signed-off-by: NYuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 YuBiao Wang 提交于
[Why] For Vega10, disabling gart of gfxhub could mess up KIQ and PSP under sriov mode, and lead to DMAR on host side. [How] Skip writing GMC registers under sriov. Signed-off-by: NYuBiao Wang <YuBiao.Wang@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Kent Russell 提交于
BOs need to be reserved before they are added or removed, so ensure that they are reserved during kfd_mem_attach and kfd_mem_detach Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 11月, 2021 1 次提交
-
-
由 Jason Gunthorpe 提交于
The huge page functionality in TTM does not work safely because PUD and PMD entries do not have a special bit. get_user_pages_fast() considers any page that passed pmd_huge() as usable: if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) || pmd_devmap(pmd))) { And vmf_insert_pfn_pmd_prot() unconditionally sets entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); eg on x86 the page will be _PAGE_PRESENT | PAGE_PSE. As such gup_huge_pmd() will try to deref a struct page: head = try_grab_compound_head(pmd_page(orig), refs, flags); and thus crash. Thomas further notices that the drivers are not expecting the struct page to be used by anything - in particular the refcount incr above will cause them to malfunction. Thus everything about this is not able to fully work correctly considering GUP_fast. Delete it entirely. It can return someday along with a proper PMD/PUD_SPECIAL bit in the page table itself to gate GUP_fast. Fixes: 314b6580 ("drm/ttm, drm/vmwgfx: Support huge TTM pagefaults") Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Reviewed-by: NThomas Hellström <thomas.helllstrom@linux.intel.com> Reviewed-by: NChristian König <christian.koenig@amd.com> [danvet: Update subject per Thomas' &Christian's review] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/0-v2-a44694790652+4ac-ttm_pmd_jgg@nvidia.com
-
- 04 11月, 2021 5 次提交
-
-
由 James Zhu 提交于
Remove duplicated kfd_resume_iommu which already runs in mdgpu_amdkfd_device_init. Tested-By: NKen Moffat <zarniwhoop@ntlworld.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NJames Zhu <James.Zhu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Aaron Liu 提交于
For yellow carp, the desired CGPG hysteresis value is 0x4E20. Signed-off-by: NAaron Liu <aaron.liu@amd.com> Acked-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Mario Limonciello 提交于
This is more useful when talking to the SMU team to have the information in this format, save one less step to manually do it. Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Oak Zeng 提交于
Aldebaran has different register mask definitions for regiter MC_VM_XGMI_LFB_CNTL. Use the correct masks to interpret fields of this register. Signed-off-by: NOak Zeng <Oak.Zeng@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jingwen Chen 提交于
[Why] In advance tdr mode, the real bad job will be resubmitted twice, while in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job is put one more time than other jobs. [How] Adding dma_fence_get before resbumit job in amdgpu_device_recheck_guilty_jobs and put the fence for normal jobs Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 10月, 2021 11 次提交
-
-
由 Alex Deucher 提交于
The DMA mask on SI parts is 40 bits not 44. Copy paste typo. Fixes: 244511f3 ("drm/amdgpu: simplify and cleanup setting the dma mask") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1762Acked-by: NChristian König <christian.koenig@amd.com> Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Add secondary instance version info for soc15 parts. Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Add secondary instance version info for vega20, arcturure, and aldebaran. Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
Remove GPRs init for ALDEBARAN in gpu reset temporarily, will add the init once the algorithm is stable. v2: Only remove GPRs init in gpu reset. v3: Suspend needs it, only skip it in gpu reset. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
Skip GPRs init in specific condition since current GPRs init algorithm only works for some CU settings. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Candice Li 提交于
TA version should only be displayed in firmware version column. Signed-off-by: NCandice Li <candice.li@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lang Yu 提交于
amdgpu_fence_driver_sw_fini() should be executed before amdgpu_device_ip_fini(), otherwise fence driver resource won't be properly freed as adev->rings have been tore down. Fixes: 72c8c97b ("drm/amdgpu: Split amdgpu_device_fini into early and late") Signed-off-by: NLang Yu <lang.yu@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lang Yu 提交于
Currently, all kfd BOs use same destruction routine. But pinned BOs are not unpinned properly. Separate them from general routine. v2 (Felix): Add safeguard to prevent user space from freeing signal BO. Kunmap signal BO in the event of setting event page error. Just kunmap signal BO to avoid duplicating the code. Signed-off-by: NLang Yu <lang.yu@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Philip Yang 提交于
The userptr can be unmapped by application and still registered to driver, restore userptr work return user pages will get -EFAULT bad address error. Pretend this error as succeed. GPU access this userptr will have VM fault later, it is better than application soft hangs with stalled user mode queues. v2: squash in warning fix (Alex) Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Kent Russell 提交于
When a GPU hits the bad_page_threshold, it will not be initialized by the amdgpu driver. This means that the table cannot be cleared, nor can information gathering be performed (getting serial number, BDF, etc). If the bad_page_threshold kernel parameter is set to -2, continue to initialize the GPU, while printing a warning to dmesg that this action has been done v2: squash in Luben's fix to restore RAS info reporting Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: NKent Russell <kent.russell@amd.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Kent Russell 提交于
dmesg doesn't warn when the number of bad pages approaches the threshold for page retirement. WARN when the number of bad pages is at 90% or greater for easier checks and planning, instead of waiting until the GPU is full of bad pages. Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 10月, 2021 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
In order to better track where in the kernel the dma-buf code is used, put the symbols in the namespace DMA_BUF and modify all users of the symbols to properly import the namespace to not break the build at the same time. Now the output of modinfo shows the use of these symbols, making it easier to watch for users over time: $ modinfo drivers/misc/fastrpc.ko | grep import import_ns: DMA_BUF Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: dri-devel@lists.freedesktop.org Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Acked-by: NChristian König <christian.koenig@amd.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NSumit Semwal <sumit.semwal@linaro.org> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20211010124628.17691-1-gregkh@linuxfoundation.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 22 10月, 2021 8 次提交
-
-
由 Alex Deucher 提交于
The extended bits were not available for use on vega20 and presumably arcturus as well. Fixes: a0f9f854 ("drm/amdgpu/nbio7.4: don't use GPU_HDP_FLUSH bit 12") Reviewed-and-tested-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Some navy flounder boards do not properly mark harvested VCN instances. Fix that here. v2: use IP versions Fixes: 1b592d00 ("drm/amdgpu/vcn: remove manual instance setting") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743Reviewed-and-tested-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
No need to use the id variable, just use the constant plus instance offset directly. Reviewed-by: NLeo Liu <leo.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
No need to use the tmp variable, just use the constant directly. Reviewed-by: NLeo Liu <leo.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Roughly the same code was present in all VCN versions. Consolidate it into a single function. v2: use AMDGPU_UCODE_ID_VCN + i, check if num_inst >= 2 Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NLeo Liu <leo.liu@amd.com> Reviewed-by: NJames Zhu <James.Zhu@amd.com>
-
由 Alex Deucher 提交于
Only enable firmware for the instance that is enabled. v2: use AMDGPU_UCODE_ID_VCN + i Fixes: 1b592d00 ("drm/amdgpu/vcn: remove manual instance setting") Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1743Reviewed-by: NJames Zhu <James.Zhu@amd.com> Reviewed-by: NLeo Liu <leo.liu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Jingwen Chen 提交于
Add dummy_page_addr to sriov msg for host driver to set GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly. v2: should update vf2pf msg instead Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com> Reviewed-by: NHorace Chen <horace.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Huang Rui 提交于
PSP firmware will be responsible for applying the GRBM CAM remapping in the production. And the GRBM_CAM_INDEX / GRBM_CAM_DATA registers will be protected by PSP under security policy. So remove it according to the new security policy. Signed-off-by: NHuang Rui <ray.huang@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 21 10月, 2021 1 次提交
-
-
由 Aaron Liu 提交于
B0 internal rev_id is 0x01, B1 internal rev_id is 0x02. The external rev_id for B0 and B1 is 0x20. The original expression is not suitable for B1. v2: squash in fix for display code (Alex) Signed-off-by: NAaron Liu <aaron.liu@amd.com> Reviewed-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
- 20 10月, 2021 1 次提交
-
-
由 Kent Russell 提交于
Change the error message when the bad_page_threshold is reached, explicitly stating that the GPU will not be initialized. Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-