- 10 2月, 2022 3 次提交
-
-
由 Andrey Grodzovsky 提交于
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
-
由 Andrey Grodzovsky 提交于
Use reset domain wq also for non TDR gpu recovery trigers such as sysfs and RAS. We must serialize all possible GPU recoveries to gurantee no concurrency there. For TDR call the original recovery function directly since it's already executed from within the wq. For others just use a wrapper to qeueue work and wait on it to finish. v2: Rename to amdgpu_recover_work_struct Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74113.html
-
由 Andrey Grodzovsky 提交于
Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Suggested-by: NChristian König <ckoenig.leichtzumerken@gmail.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74111.html
-
- 31 12月, 2021 1 次提交
-
-
由 Alex Deucher 提交于
If we are the primary adapter (i.e., the one used by the firwmare framebuffer), disable runtime pm. This fixes a regression caused by commit 55285e21 which results in the displays waking up shortly after they go to sleep due to the device coming out of runtime suspend and sending a hotplug uevent. v2: squash in reworked fix from Evan Fixes: 55285e21 ("fbdev/efifb: Release PCI device's runtime PM ref during FB destroy") Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215203 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1840Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 30 12月, 2021 1 次提交
-
-
由 Kent Russell 提交于
Having seen at least 1 42-character product_name, bump the number up to 64, and put that definition into amdgpu.h to make future adjustments simpler. Signed-off-by: NKent Russell <kent.russell@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 14 12月, 2021 3 次提交
-
-
由 Philip Yang 提交于
If host and amdgpu IOMMU is not enabled or IOMMU is pass through mode, set adev->ram_is_direct_mapped flag which will be used to optimize memory usage for multi GPU mappings. Signed-off-by: NPhilip Yang <Philip.Yang@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Lang Yu 提交于
It is useful to maintain error context when debugging SW/FW issues. Introduce amdgpu_device_halt() for this purpose. It will bring hardware to a kind of halt state, so that no one can touch it any more. Compare to a simple hang, the system will keep stable at least for SSH access. Then it should be trivial to inspect the hardware state and see what's going on. v2: - Set adev->no_hw_access earlier to avoid potential crashes.(Christian) Suggested-by: NChristian Koenig <christian.koenig@amd.com> Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NLang Yu <lang.yu@amd.com> Reviewed-by: NChristian Koenig <christian.koenig@amd.co> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Isabella Basso 提交于
This fixes the warning below by changing the prototype to a location that's actually included by the .c files that call amdgpu_kms_compat_ioctl: warning: no previous prototype for ‘amdgpu_kms_compat_ioctl’ [-Wmissing-prototypes] 37 | long amdgpu_kms_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) | ^~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: NIsabella Basso <isabbasso@riseup.net> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 02 12月, 2021 1 次提交
-
-
由 Lijo Lazar 提交于
HW_ID_MAX considers HWID of all IPs, far more than what amdgpu uses. amdgpu tracks only the IPs defined by amd_hw_ip_block_type whose max is MAX_HWIP. Signed-off-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 11月, 2021 1 次提交
-
-
由 Christian König 提交于
Just grab all fences for the display flip in one go. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211028132630.2330-2-christian.koenig@amd.com
-
- 29 10月, 2021 1 次提交
-
-
由 Kent Russell 提交于
When a GPU hits the bad_page_threshold, it will not be initialized by the amdgpu driver. This means that the table cannot be cleared, nor can information gathering be performed (getting serial number, BDF, etc). If the bad_page_threshold kernel parameter is set to -2, continue to initialize the GPU, while printing a warning to dmesg that this action has been done v2: squash in Luben's fix to restore RAS info reporting Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: NKent Russell <kent.russell@amd.com> Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 10月, 2021 3 次提交
-
-
由 Guchun Chen 提交于
In current code, when a PCI error state pci_channel_io_normal is detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver will continue the execution of PCI resume callback report_resume by pci_walk_bridge, and the callback will go into amdgpu_pci_resume finally, where write lock is releasd unconditionally without acquiring such lock first. In this case, a deadlock will happen when other threads start to acquire the read lock. To fix this, add a member in amdgpu_device strucutre to cache pci_channel_state, and only continue the execution in amdgpu_pci_resume when it's pci_channel_io_frozen. Fixes: c9a6b82f ("drm/amdgpu: Implement DPC recovery") Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Guchun Chen 提交于
In current code, when a PCI error state pci_channel_io_normal is detectd, it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI driver will continue the execution of PCI resume callback report_resume by pci_walk_bridge, and the callback will go into amdgpu_pci_resume finally, where write lock is releasd unconditionally without acquiring such lock first. In this case, a deadlock will happen when other threads start to acquire the read lock. To fix this, add a member in amdgpu_device strucutre to cache pci_channel_state, and only continue the execution in amdgpu_pci_resume when it's pci_channel_io_frozen. Fixes: c9a6b82f ("drm/amdgpu: Implement DPC recovery") Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Christian König 提交于
This reverts commit 728e7e0c. Further discussion reveals that this feature is severely broken and needs to be reverted ASAP. GPU reset can never be delayed by userspace even for debugging or otherwise we can run into in kernel deadlocks. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NNirmoy Das <nirmoy.das@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 10月, 2021 6 次提交
-
-
由 Alex Deucher 提交于
Allow us to query instances versions more cleanly. Instancing support is not consistent unfortunately. SDMA is a good example. Sienna cichlid has 4 total SDMA instances, each enumerated separately (HWIDs 42, 43, 68, 69). Arcturus has 8 total SDMA instances, but they are enumerated as multiple instances of the same HWIDs (4x HWID 42, 4x HWID 43). UMC is another example. On most chips there are multiple instances with the same HWID. This allows us to support both forms. v2: rebase v3: clarify instancing support Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
So we can store the VCN IP revision for each instance of VCN. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
So we can track grab the appropriate DCE info out of the IP discovery table. This is a separare IP from DCN. Acked-by: NHarry Wentland <harry.wentland@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
So we can track grab the appropriate XGMI info out of the IP discovery table. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
So we can check the IP versions directly rather than using asic type. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Alex Deucher 提交于
Useful for debugging and new asic validation. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 9月, 2021 1 次提交
-
-
由 Ernst Sjöstrand 提交于
Seems like newer cards can have even more instances now. Found by UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29 index 8 is out of range for type 'uint32_t *[8]' Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697 Cc: stable@vger.kernel.org Signed-off-by: NErnst Sjöstrand <ernstp@gmail.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 9月, 2021 1 次提交
-
-
由 Ernst Sjöstrand 提交于
Seems like newer cards can have even more instances now. Found by UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:318:29 index 8 is out of range for type 'uint32_t *[8]' Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1697 Cc: stable@vger.kernel.org Signed-off-by: NErnst Sjöstrand <ernstp@gmail.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 8月, 2021 1 次提交
-
-
由 John Clements 提交于
Add MCA specific IP blocks targetting RAS features Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 8月, 2021 1 次提交
-
-
由 Evan Quan 提交于
Currently, the readout of fan speed pwm is transited into percent-based and then pwm-based. However, the transition into percent-based is totally unnecessary and make the final output less accurate. Signed-off-by: NEvan Quan <evan.quan@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 8月, 2021 1 次提交
-
-
由 Ryan Taylor 提交于
Modify the VKMS driver into an api that dce_virtual can use to create virtual displays that obey drm's atomic modesetting api. v2: Made local functions static. v3: Switched vkms_output kzalloc for kcalloc. Cleanup patches by moving display mode fixes to this patch. v4: Update atomic_check and atomic_update to comply with new kms api. Signed-off-by: NRyan Taylor <Ryan.Taylor@amd.com> Reported-by: Nkernel test robot <lkp@intel.com> Suggested-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 7月, 2021 1 次提交
-
-
由 Pratik Vishwakarma 提交于
Rename amdgpu_acpi_is_s0ix_supported to better explain functionality by renaming to amdgpu_acpi_is_s0ix_active Signed-off-by: NPratik Vishwakarma <Pratik.Vishwakarma@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 7月, 2021 3 次提交
-
-
Optimized the code for codec info structure initialization Signed-off-by: NVeerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Reviewed-by: NJames Zhu <James.Zhu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Kevin Wang 提交于
split amdgpu_device_access_vram() 1. amdgpu_device_mm_access(): using MM_INDEX/MM_DATA to access vram 2. amdgpu_device_aper_access(): using vram aperature to access vram (option) Signed-off-by: NKevin Wang <kevin1.wang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
Optimized the code for codec info structure initialization Signed-off-by: NVeerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com> Reviewed-by: NJames Zhu <James.Zhu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 05 6月, 2021 2 次提交
-
-
由 Eric Huang 提交于
Integrate two generic functions to determine if HDP flush is needed for all Asics. Signed-off-by: NEric Huang <jinhuieric.huang@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Sathishkumar S 提交于
add sysfs attr to read/write smartshift bias level. document smartshift_bias sysfs attr. V2: add attr to amdgpu_device_attrs and use attr_update (Lijo) Signed-off-by: NSathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 02 6月, 2021 2 次提交
-
-
由 Sathishkumar S 提交于
enable smart shift on dGPU if it is part of HG system and the platform supports ATCS method to handle power shift. V2: avoid psc updates in baco enter and exit (Lijo) fix alignment (Shashank) V3: rebased on unified ATCS handling. (Alex) V4: check for return value and warn on failed update (Shashank) return 0 if device does not support smart shift. (Lizo) V5: rebased on ATPX/ATCS structures global (Alex) Signed-off-by: NSathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Reviewed-by: NShashank Sharma <shashank.sharma@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> -
由 Sathishkumar S 提交于
add support to handle ATCS method for power shift control. used to communicate dGPU device state to SBIOS. V2: use defined acpi func for checking psc support (Lijo) fix alignment (Shashank) V3: rebased on unified ATCS handling (Alex) V4: rebased on ATPX/ATCS structures global (Alex) Signed-off-by: NSathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 5月, 2021 1 次提交
-
-
由 Alex Deucher 提交于
They are global ACPI methods, so maybe the structures global in the driver. This simplified a number of things in the handling of these methods. v2: reset the handle if verify interface fails (Lijo) v3: fix compilation when ACPI is not defined. Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 5月, 2021 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Make it's name not feature but function descriptive. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210521204122.762288-1-andrey.grodzovsky@amd.com
-
- 22 5月, 2021 1 次提交
-
-
由 Alex Deucher 提交于
Treat it like ATIF and check both the dGPU and APU for the method. This is required because ATCS may be hung off of the APU in ACPI on A+A systems. v2: add back accidently removed ACPI handle check. v3: Fix incorrect atif check (Colin) Fix uninitialized variable (Colin) Reviewed-by: NLijo Lazar <lijo.lazar@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 21 5月, 2021 1 次提交
-
-
由 Peng Ju Zhou 提交于
This patch series are used for GC/MMHUB(part)/IH_RB_CNTL indirect access in the SRIOV environment. There are 4 bits, controlled by host, to control if GC/MMHUB(part)/IH_RB_CNTL indirect access enabled. (one bit is master bit controls other 3 bits) For GC registers, changing all the register access from MMIO to RLC and use RLC as the default access method in the full access time. For partial MMHUB registers, changing their access from MMIO to RLC in the full access time, the remaining registers keep the original access method. For IH_RB_CNTL register, changing it's access from MMIO to PSP. Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 5月, 2021 2 次提交
-
-
由 Andrey Grodzovsky 提交于
Some of the stuff in amdgpu_device_fini such as HW interrupts disable and pending fences finilization must be done right away on pci_remove while most of the stuff which relates to finilizing and releasing driver data structures can be kept until drm_driver.release hook is called, i.e. when the last device reference is dropped. v4: Change functions prefix early->hw and late->sw Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-3-andrey.grodzovsky@amd.com
-
由 Likun GAO 提交于
Judgement whether to add an sw ip according to the harvest info. v2: fix indentation (Alex) Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 5月, 2021 1 次提交
-
-
由 Likun GAO 提交于
Judgement whether to add an sw ip according to the harvest info. v2: fix indentation (Alex) Signed-off-by: NLikun Gao <Likun.Gao@amd.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-