- 02 6月, 2022 2 次提交
-
-
由 Candice Li 提交于
Adjust the sequence for ras late init and separate ras reset error status from query status. v2: squash in fix from Candice Signed-off-by: NCandice Li <candice.li@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Stanley.Yang 提交于
Fix aldebaran ras supported check on SRIOV guest side, the previous check conditicon block all ras feature on baremetal Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 5月, 2022 1 次提交
-
-
由 Stanley.Yang 提交于
support umc/gfx/sdma ras on guest side Changed from V1: move sriov judgment in amdgpu_ras_interrupt_fatal_error_handler Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 11 5月, 2022 2 次提交
-
-
由 Tao Zhou 提交于
Qeury ras status before ras poison consumption handling, add more comment and log. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-and-tested-by: NMohammad Zafar Ziya <Mohammadzafar.ziya@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
Enable RAS IH if poison consumption handler is implemented. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NMohammad Zafar Ziya <Mohammadzafar.ziya@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 4月, 2022 1 次提交
-
-
由 Haowen Bai 提交于
After alloc fail, we do not need to kfree. Signed-off-by: NHaowen Bai <baihaowen@meizu.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 23 4月, 2022 3 次提交
-
-
由 Tao Zhou 提交于
The fatal error handler is independent from general ras interrupt handler since there is no related IH ring. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
Add support for general RAS poison consumption handler. v2: remove callback function for poison consumption. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Tao Zhou 提交于
Prepare for the implementation of poison consumption handler. v2: separate umc handler from poison creation. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 3月, 2022 1 次提交
-
-
由 Mohammad Zafar Ziya 提交于
Add vcn and jpeg ras support options V2: vcn and jpeg ras flag enabled for aldebaran asic only V3: vcn and jpeg ras flag disabled for error counter query Generic poison query interface added VCN and JPEG ras enabled based on IP version check V4: vcn and jpeg ras flag moved under ecc flag for dGPU Signed-off-by: NMohammad Zafar Ziya <Mohammadzafar.ziya@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 16 3月, 2022 1 次提交
-
-
由 Stanley.Yang 提交于
It should notice SMU to update bad channel info when detected uncorrectable error in UMC block Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 3月, 2022 2 次提交
-
-
由 yipechai 提交于
1. Define amdgpu_ras_block_late_fini_default in amdgpu_ras.c as .ras_fini common function, which is called when .ras_fini of ras block isn't initialized. 2. Remove the code of using amdgpu_ras_block_late_fini to initialize .ras_fini in ras blocks. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
centrally calls the .ras_fini function of all ras blocks. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 24 2月, 2022 2 次提交
-
-
由 yipechai 提交于
Fixed warning reported by kernel test robot: 1.warning: variable 'ras_obj' is used uninitialized whenever '||' condition is true. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Maíra Canal 提交于
Turn previously global function into a static function to avoid the following Clang warning: drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2459:5: warning: no previous prototype for function 'amdgpu_ras_block_late_init_default' [-Wmissing-prototypes] int amdgpu_ras_block_late_init_default(struct amdgpu_device *adev, ^ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2459:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int amdgpu_ras_block_late_init_default(struct amdgpu_device *adev, ^ static Signed-off-by: NMaíra Canal <maira.canal@usp.br> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 18 2月, 2022 3 次提交
-
-
由 Tom Rix 提交于
Clang build fails with amdgpu_ras.c:2416:7: error: variable 'ras_obj' is used uninitialized whenever 'if' condition is true if (adev->in_suspend || amdgpu_in_reset(adev)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ amdgpu_ras.c:2453:6: note: uninitialized use occurs here if (ras_obj->ras_cb) ^~~~~~~ There is a logic error in the error handler's labels. ex/ The sysfs: is the last goto label in the normal code but is the middle of error handler. Rework the error handler. cleanup: is the first error, so it's handler should be last. interrupt: is the second error, it's handler is next. interrupt: handles the failure of amdgpu_ras_interrupt_add_hander() by calling amdgpu_ras_interrupt_remove_handler(). This is wrong, remove() assumes the interrupt has been setup, not torn down by add(). Change the goto label to cleanup. sysfs is the last error, it's handler should be first. sysfs: handles the failure of amdgpu_ras_sysfs_create() by calling amdgpu_ras_sysfs_remove(). But when the create() fails there is nothing added so there is nothing to remove. This error handler is not needed. Remove the error handler and change goto label to interrupt. Fixes: b293e891 ("drm/amdgpu: add helper function to do common ras_late_init/fini (v3)") Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com> Signed-off-by: NTom Rix <trix@redhat.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
1. Define amdgpu_ras_block_late_init_default in amdgpu_ras.c as .ras_late_init common function, which is called when .ras_late_init of ras block isn't initialized. 2. Remove the code of using amdgpu_ras_block_late_init to initialize .ras_late_init in ras blocks. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
Define amdgpu_ras_late_init to call all ras blocks' .ras_late_init. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 2月, 2022 4 次提交
-
-
由 yipechai 提交于
drm/amdgpu: Merge amdgpu_ras_late_init/amdgpu_ras_late_fini to amdgpu_ras_block_late_init/amdgpu_ras_block_late_fini 1. Merge amdgpu_ras_late_init to amdgpu_ras_block_late_init. 2. Remove amdgpu_ras_late_init since no ras block calls amdgpu_ras_late_init. 3. Merge amdgpu_ras_late_fini to amdgpu_ras_block_late_fini. 4. Remove amdgpu_ras_late_fini since no ras block calls amdgpu_ras_late_fini. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
In order to reduce redundant struct conversion, modify operating sysfs and interrupt function interface parameters. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
Optimize amdgpu_nbio_ras_late_init/amdgpu_nbio_ras_fini function code. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
1. Define amdgpu_ras_block_late_init to create sysfs nodes and interrupt handles. 2. Define amdgpu_ras_block_late_fini to remove sysfs nodes and interrupt handles. 3. Replace ras block variable members in struct amdgpu_ras_block_object with struct ras_common_if, which can make it easy to associate each ras block instance with each ras block functional interface. 4. Add .ras_cb to struct amdgpu_ras_block_object. 5. Change each ras block to fit for the changement of struct amdgpu_ras_block_object. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 08 2月, 2022 3 次提交
-
-
由 yipechai 提交于
The commit d5e8ff5f ("drm/amdgpu: Fixed the defect of soft lock caused by infinite loop") had fixed this defect. Revert workaround commit a2170b4a ("drm/amdgpu: Add judgement to avoid infinite loop"). Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
1. The infinite loop case only occurs on multiple cards support ras functions. 2. The explanation of root cause refer to commit 76641cbbf196 ("drm/amdgpu: Add judgement to avoid infinite loop"). 3. Create new node to manage each unique ras instance to guarantee each device .ras_list is completely independent. 4. Fixes: commit 7a6b8ab3231b51 ("drm/amdgpu: Unify ras block interface for each ras block"). 5. The soft locked logs are as follows: [ 262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G OE 5.13.0-27-generic #29~20.04.1-Ubuntu [ 262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020 [ 262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [ 262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu] [ 262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45 [ 262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202 [ 262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248 [ 262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000 [ 262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001 [ 262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e [ 262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003 [ 262.166258] FS: 0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000 [ 262.166261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0 [ 262.166267] Call Trace: [ 262.166272] amdgpu_ras_do_recovery+0x130/0x290 [amdgpu] [ 262.166529] ? psi_task_switch+0xd2/0x250 [ 262.166537] ? __switch_to+0x11d/0x460 [ 262.166542] ? __switch_to_asm+0x36/0x70 [ 262.166549] process_one_work+0x220/0x3c0 [ 262.166556] worker_thread+0x4d/0x3f0 [ 262.166560] ? process_one_work+0x3c0/0x3c0 [ 262.166563] kthread+0x12b/0x150 [ 262.166568] ? set_kthread_struct+0x40/0x40 [ 262.166571] ret_from_fork+0x22/0x30 Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Luben Tuikov 提交于
MESA polls for errors every 2-3 seconds. Printing with dev_info() causes the dmesg log to fill up with the same message, e.g, [18028.206676] amdgpu 0000:0b:00.0: amdgpu: df doesn't config ras function. Make it dev_dbg_once(), as it isn't something correctible during boot or thereafter, so printing just once is sufficient. Also sanitize the message. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: yipechai <YiPeng.Chai@amd.com> Fixes: 8b0fb0e9 ("drm/amdgpu: Modify gfx block to fit for the unified ras block data and ops") Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 03 2月, 2022 1 次提交
-
-
由 yipechai 提交于
1. The infinite loop causing soft lock occurs on multiple amdgpu cards supporting ras feature. 2. This a workaround patch to fix 6492e1b0. It is valid for multiple amdgpu cards of the same type. 3. The root cause is that each GPU card device has a separate .ras_list link header, but the instance and linked list node of each ras block are unique. When each device is initialized, each ras instance will repeatedly add link node to the device every time. In this way, only the .ras_list of the last initialized device is completely correct. the .ras_list->prev and .ras_list->next of the device initialzied before can still point to the correct ras instance, but the prev pointer and next pointer of the pointed ras instance both point to the last initialized device's .ras_ list instead of the beginning .ras_ list. When using list_for_each_entry_safe searches for non-existent Ras nodes on devices other than the last device, the last ras instance next pointer cannot always be equal to the beginning .ras_list, so that the loop cannot be terminated, the program enters a infinite loop. BTW: Since the data and initialization process of each card are the same, the link list between ras instances will not be destroyed every time the device is initialized. 4. The soft locked logs are as follows: [ 262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G OE 5.13.0-27-generic #29~20.04.1-Ubuntu [ 262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020 [ 262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [ 262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu] [ 262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45 [ 262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202 [ 262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248 [ 262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000 [ 262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001 [ 262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e [ 262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003 [ 262.166258] FS: 0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000 [ 262.166261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0 [ 262.166267] Call Trace: [ 262.166272] amdgpu_ras_do_recovery+0x130/0x290 [amdgpu] [ 262.166529] ? psi_task_switch+0xd2/0x250 [ 262.166537] ? __switch_to+0x11d/0x460 [ 262.166542] ? __switch_to_asm+0x36/0x70 [ 262.166549] process_one_work+0x220/0x3c0 [ 262.166556] worker_thread+0x4d/0x3f0 [ 262.166560] ? process_one_work+0x3c0/0x3c0 [ 262.166563] kthread+0x12b/0x150 [ 262.166568] ? set_kthread_struct+0x40/0x40 [ 262.166571] ret_from_fork+0x22/0x30 Fixes: 6492e1b0 ("drm/amdgpu: Unify ras block interface for each ras block") Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 28 1月, 2022 1 次提交
-
-
由 Tao Zhou 提交于
Create common amdgpu_umc_fill_error_record function for all versions of UMC and clean up related codes. Signed-off-by: NTao Zhou <tao.zhou1@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 1月, 2022 1 次提交
-
-
由 yipechai 提交于
This reverts commit df4f0041. Xgmi ras initialization had been moved from .late_init to early_init, the defect of repeated calling amdgpu_ras_register_ras_block had been fixed, so revert this patch. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 1月, 2022 1 次提交
-
-
由 yipechai 提交于
Remove repeated calls. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 1月, 2022 2 次提交
-
-
由 yipechai 提交于
Fix the code style warnings in amdgpu_ras: 1. ERROR: space required before the open parenthesis '('. 2. WARNING: line length of xxx exceeds 100 columns. 3. ERROR: "foo* bar" should be "foo *bar". 4. WARNING: unnecessary whitespace before a quoted newline. 5. WARNING: space prohibited before semicolon. 6. WARNING: suspect code indent for conditional statements. 7. WARNING: braces {} are not necessary for single statement blocks. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Stanley.Yang 提交于
Changed from v1: remove unused brace Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 15 1月, 2022 9 次提交
-
-
由 yipechai 提交于
fix compile warning for ras_block_match_default Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
Use ARRAY_SIZE to get array length. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yang Li 提交于
Eliminate the follow smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3504 amdgpu_device_init() warn: inconsistent indenting drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1716 amdgpu_ras_error_status_query() warn: if statement not indented drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1058 amdgpu_ras_error_inject() warn: inconsistent indenting Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NYang Li <yang.lee@linux.alibaba.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yang Li 提交于
Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Reviewed-by: NGuchun Chen <guchun.chen@amd.com> Signed-off-by: NYang Li <yang.lee@linux.alibaba.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
No longer insert ras blocks into ras_list if it already exists in ras_list. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
Add ras supported check for register_ras_block. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
Removed redundant ras code. Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
1. Move xgmi special error inject function from amdgpu_ras.c to xgmi block. 2. Support to use psp_ras_trigger_error as default error inject function in amdgpu_ras.c. If .ras_error_inject isn't defined in ras block, default error inject function will take effect. v2: squash in warning fix (Alex) Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 yipechai 提交于
1.Modify sdma block to fit for the unified ras block data and ops. 2.Change amdgpu_sdma_ras_funcs to amdgpu_sdma_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of sdma ras variable so that sdma ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register sdma ras block into amdgpu device ras block link list. 5.Remove the redundant code about sdma in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of sdma versions. If .ras_late_init and .ras_fini had been defined by the selected sdma version, the defined functions will take effect; if not defined, default fill them with amdgpu_sdma_ras_late_init and amdgpu_sdma_ras_fini. v2: squash in warning fix (Alex) Signed-off-by: Nyipechai <YiPeng.Chai@amd.com> Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: NJohn Clements <john.clements@amd.com> Reviewed-by: NTao Zhou <tao.zhou1@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-