提交 · 425d7a87e54ee358f580eaf10cf28dc95f7121c1 · openeuler / Kernel

16 3月, 2022 1 次提交

drm/amdgpu: message smu to update bad channel info · 69691c82

由 Stanley.Yang 提交于 3月 03, 2022

It should notice SMU to update bad channel info when detected
uncorrectable error in UMC block
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

69691c82

03 3月, 2022 2 次提交

drm/amdgpu: Remove redundant .ras_fini initialization in some ras blocks · 80e0c2cb

由 yipechai 提交于 2月 17, 2022

1. Define amdgpu_ras_block_late_fini_default in amdgpu_ras.c as
   .ras_fini common function, which is called when
   .ras_fini of ras block isn't initialized.
2. Remove the code of using amdgpu_ras_block_late_fini to
   initialize .ras_fini in ras blocks.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

80e0c2cb

drm/amdgpu: centrally calls the .ras_fini function of all ras blocks · 1f211a82

由 yipechai 提交于 2月 14, 2022

centrally calls the .ras_fini function of all ras blocks.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1f211a82

24 2月, 2022 2 次提交

drm/amdgpu: Fixed warning reported by kernel test robot · 29c9b6cd

由 yipechai 提交于 2月 23, 2022

Fixed warning reported by kernel test robot:
1.warning: variable 'ras_obj' is used uninitialized
  whenever '||' condition is true.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29c9b6cd

drm/amdgpu: Change amdgpu_ras_block_late_init_default function scope · d41ff22a

由 Maíra Canal 提交于 2月 22, 2022

Turn previously global function into a static function to avoid the
following Clang warning:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2459:5: warning: no previous prototype
for function 'amdgpu_ras_block_late_init_default' [-Wmissing-prototypes]
int amdgpu_ras_block_late_init_default(struct amdgpu_device *adev,
    ^
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2459:1: note: declare 'static' if the
function is not intended to be used outside of this translation unit
int amdgpu_ras_block_late_init_default(struct amdgpu_device *adev,
^
static
Signed-off-by: NMaíra Canal <maira.canal@usp.br>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d41ff22a

18 2月, 2022 3 次提交

drm/amdgpu: fix amdgpu_ras_block_late_init error handler · 779596ce

由 Tom Rix 提交于 2月 17, 2022

Clang build fails with
amdgpu_ras.c:2416:7: error: variable 'ras_obj' is used uninitialized
  whenever 'if' condition is true
  if (adev->in_suspend || amdgpu_in_reset(adev)) {
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

amdgpu_ras.c:2453:6: note: uninitialized use occurs here
 if (ras_obj->ras_cb)
     ^~~~~~~

There is a logic error in the error handler's labels.
ex/ The sysfs: is the last goto label in the normal code but
is the middle of error handler.  Rework the error handler.

cleanup: is the first error, so it's handler should be last.

interrupt: is the second error, it's handler is next.  interrupt:
handles the failure of amdgpu_ras_interrupt_add_hander() by
calling amdgpu_ras_interrupt_remove_handler().  This is wrong,
remove() assumes the interrupt has been setup, not torn down by
add().  Change the goto label to cleanup.

sysfs is the last error, it's handler should be first.  sysfs:
handles the failure of amdgpu_ras_sysfs_create() by calling
amdgpu_ras_sysfs_remove().  But when the create() fails there
is nothing added so there is nothing to remove.  This error
handler is not needed. Remove the error handler and change
goto label to interrupt.

Fixes: b293e891 ("drm/amdgpu: add helper function to do common ras_late_init/fini (v3)")
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NTom Rix <trix@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

779596ce

drm/amdgpu: Remove redundant .ras_late_init initialization in some ras blocks · 418abce2

由 yipechai 提交于 2月 15, 2022

1. Define amdgpu_ras_block_late_init_default in amdgpu_ras.c as
   .ras_late_init common function, which is called when
   .ras_late_init of ras block isn't initialized.
2. Remove the code of using amdgpu_ras_block_late_init to
   initialize .ras_late_init in ras blocks.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

418abce2

drm/amdgpu: define amdgpu_ras_late_init to call all ras blocks' .ras_late_init · 867e24ca

由 yipechai 提交于 2月 14, 2022

Define amdgpu_ras_late_init to call all ras blocks' .ras_late_init.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

867e24ca

15 2月, 2022 4 次提交

drm/amdgpu: Merge amdgpu_ras_late_init/amdgpu_ras_late_fini to... · 563285c8

由 yipechai 提交于 2月 09, 2022

drm/amdgpu: Merge amdgpu_ras_late_init/amdgpu_ras_late_fini to amdgpu_ras_block_late_init/amdgpu_ras_block_late_fini

1. Merge amdgpu_ras_late_init to
   amdgpu_ras_block_late_init.
2. Remove amdgpu_ras_late_init since no ras block
   calls amdgpu_ras_late_init.
3. Merge amdgpu_ras_late_fini to
   amdgpu_ras_block_late_fini.
4. Remove amdgpu_ras_late_fini since no ras block
   calls amdgpu_ras_late_fini.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

563285c8

drm/amdgpu: Optimize operating sysfs and interrupt function interface in amdgpu_ras.c · 9252d33d

由 yipechai 提交于 2月 09, 2022

In order to reduce redundant struct conversion, modify
operating sysfs and interrupt function interface parameters.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9252d33d

drm/amdgpu: Optimize amdgpu_nbio_ras_late_init/amdgpu_nbio_ras_fini function code · 80ed77f9

由 yipechai 提交于 2月 08, 2022

Optimize amdgpu_nbio_ras_late_init/amdgpu_nbio_ras_fini function code.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

80ed77f9

drm/amdgpu: Optimize xxx_ras_late_init/xxx_ras_late_fini for each ras block · bdb3489c

由 yipechai 提交于 1月 30, 2022

1. Define amdgpu_ras_block_late_init to create sysfs nodes
   and interrupt handles.
2. Define amdgpu_ras_block_late_fini to remove sysfs nodes
   and interrupt handles.
3. Replace ras block variable members in struct
   amdgpu_ras_block_object with struct ras_common_if, which
   can make it easy to associate each ras block instance
   with each ras block functional interface.
4. Add .ras_cb to struct amdgpu_ras_block_object.
5. Change each ras block to fit for the changement of struct
   amdgpu_ras_block_object.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdb3489c

08 2月, 2022 3 次提交

Revert "drm/amdgpu: Add judgement to avoid infinite loop" · a50b0482

由 yipechai 提交于 2月 07, 2022

The commit d5e8ff5f ("drm/amdgpu: Fixed the defect of soft lock caused by infinite loop")
had fixed this defect.

Revert workaround
commit a2170b4a ("drm/amdgpu: Add judgement to avoid infinite loop").
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a50b0482

drm/amdgpu: Fixed the defect of soft lock caused by infinite loop · d5e8ff5f

由 yipechai 提交于 1月 29, 2022

1. The infinite loop case only occurs on multiple cards support
   ras functions.
2. The explanation of root cause refer to commit 76641cbbf196
   ("drm/amdgpu: Add judgement to avoid infinite loop").
3. Create new node to manage each unique ras instance to guarantee
   each device .ras_list is completely independent.
4. Fixes: commit 7a6b8ab3231b51 ("drm/amdgpu: Unify ras block
   interface for each ras block").
5. The soft locked logs are as follows:
[  262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G           OE     5.13.0-27-generic #29~20.04.1-Ubuntu
[  262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020
[  262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[  262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu]
[  262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45
[  262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202
[  262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248
[  262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000
[  262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001
[  262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e
[  262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003
[  262.166258] FS:  0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000
[  262.166261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0
[  262.166267] Call Trace:
[  262.166272]  amdgpu_ras_do_recovery+0x130/0x290 [amdgpu]
[  262.166529]  ? psi_task_switch+0xd2/0x250
[  262.166537]  ? __switch_to+0x11d/0x460
[  262.166542]  ? __switch_to_asm+0x36/0x70
[  262.166549]  process_one_work+0x220/0x3c0
[  262.166556]  worker_thread+0x4d/0x3f0
[  262.166560]  ? process_one_work+0x3c0/0x3c0
[  262.166563]  kthread+0x12b/0x150
[  262.166568]  ? set_kthread_struct+0x40/0x40
[  262.166571]  ret_from_fork+0x22/0x30
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d5e8ff5f

drm/amdgpu: Print once if RAS unsupported · afa37315

由 Luben Tuikov 提交于 2月 03, 2022

MESA polls for errors every 2-3 seconds. Printing with dev_info() causes
the dmesg log to fill up with the same message, e.g,

[18028.206676] amdgpu 0000:0b:00.0: amdgpu: df doesn't config ras function.

Make it dev_dbg_once(), as it isn't something correctible during boot or
thereafter, so printing just once is sufficient. Also sanitize the message.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: yipechai <YiPeng.Chai@amd.com>
Fixes: 8b0fb0e9 ("drm/amdgpu: Modify gfx block to fit for the unified ras block data and ops")
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

afa37315

03 2月, 2022 1 次提交

drm/amdgpu: Add judgement to avoid infinite loop · a2170b4a

由 yipechai 提交于 1月 29, 2022

1. The infinite loop causing soft lock occurs on multiple amdgpu cards
   supporting ras feature.
2. This a workaround patch to fix 6492e1b0.
   It is valid for multiple amdgpu cards of the same type.
3. The root cause is that each GPU card device has a separate .ras_list
   link header, but the instance and linked list node of each ras block
   are unique. When each device is initialized, each ras instance will
   repeatedly add link node to the device every time. In this way, only
   the .ras_list of the last initialized device is completely correct.
   the .ras_list->prev and .ras_list->next of the device initialzied
   before can still point to the correct ras instance, but the prev
   pointer and next pointer of the pointed ras instance both point to
   the last initialized device's .ras_ list instead of the beginning
   .ras_ list. When using list_for_each_entry_safe searches for
   non-existent Ras nodes on devices other than the last device, the
   last ras instance next pointer cannot always be equal to the
   beginning .ras_list, so that the loop cannot be terminated, the
   program enters a infinite loop.
 BTW: Since the data and initialization process of each card are the same,
      the link list between ras instances will not be destroyed every time
      the device is initialized.
 4. The soft locked logs are as follows:
[  262.165690] CPU: 93 PID: 758 Comm: kworker/93:1 Tainted: G           OE     5.13.0-27-generic #29~20.04.1-Ubuntu
[  262.165695] Hardware name: Supermicro AS -4124GS-TNR/H12DSG-O-CPU, BIOS T20200717143848 07/17/2020
[  262.165698] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
[  262.165980] RIP: 0010:amdgpu_ras_get_ras_block+0x86/0xd0 [amdgpu]
[  262.166239] Code: 68 d8 4c 8d 71 d8 48 39 c3 74 54 49 8b 45 38 48 85 c0 74 32 44 89 fa 44 89 e6 4c 89 ef e8 82 e4 9b dc 85 c0 74 3c 49 8b 46 28 <49> 8d 56 28 4d 89 f5 48 83 e8 28 48 39 d3 74 25 49 89 c6 49 8b 45
[  262.166243] RSP: 0018:ffffac908fa87d80 EFLAGS: 00000202
[  262.166247] RAX: ffffffffc1394248 RBX: ffff91e4ab8d6e20 RCX: ffffffffc1394248
[  262.166249] RDX: ffff91e4aa356e20 RSI: 000000000000000e RDI: ffff91e4ab8c0000
[  262.166252] RBP: ffffac908fa87da8 R08: 0000000000000007 R09: 0000000000000001
[  262.166254] R10: ffff91e4930b64ec R11: 0000000000000000 R12: 000000000000000e
[  262.166256] R13: ffff91e4aa356df8 R14: ffffffffc1394320 R15: 0000000000000003
[  262.166258] FS:  0000000000000000(0000) GS:ffff92238fb40000(0000) knlGS:0000000000000000
[  262.166261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  262.166264] CR2: 00000001004865d0 CR3: 000000406d796000 CR4: 0000000000350ee0
[  262.166267] Call Trace:
[  262.166272]  amdgpu_ras_do_recovery+0x130/0x290 [amdgpu]
[  262.166529]  ? psi_task_switch+0xd2/0x250
[  262.166537]  ? __switch_to+0x11d/0x460
[  262.166542]  ? __switch_to_asm+0x36/0x70
[  262.166549]  process_one_work+0x220/0x3c0
[  262.166556]  worker_thread+0x4d/0x3f0
[  262.166560]  ? process_one_work+0x3c0/0x3c0
[  262.166563]  kthread+0x12b/0x150
[  262.166568]  ? set_kthread_struct+0x40/0x40
[  262.166571]  ret_from_fork+0x22/0x30

Fixes: 6492e1b0 ("drm/amdgpu: Unify ras block interface for each ras block")
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2170b4a

28 1月, 2022 1 次提交

drm/amdgpu: add umc_fill_error_record to make code more simple · 400013b2

由 Tao Zhou 提交于 1月 19, 2022

Create common amdgpu_umc_fill_error_record function for all versions
of UMC and clean up related codes.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

400013b2

26 1月, 2022 1 次提交

Revert "drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list" · e9287ef8

由 yipechai 提交于 1月 19, 2022

This reverts commit df4f0041.

Xgmi ras initialization had been moved from .late_init to early_init,
the defect of repeated calling amdgpu_ras_register_ras_block had been
fixed, so revert this patch.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e9287ef8

20 1月, 2022 1 次提交

drm/amdgpu: Remove repeated calls · 8eb53bb2

由 yipechai 提交于 1月 17, 2022

Remove repeated calls.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8eb53bb2

19 1月, 2022 2 次提交

drm/amdgpu: Fix the code style warnings in amdgpu_ras · b6efdb02

由 yipechai 提交于 1月 14, 2022

Fix the code style warnings in amdgpu_ras:
1. ERROR: space required before the open parenthesis '('.
2. WARNING: line length of xxx exceeds 100 columns.
3. ERROR: "foo* bar" should be "foo *bar".
4. WARNING: unnecessary whitespace before a quoted newline.
5. WARNING: space prohibited before semicolon.
6. WARNING: suspect code indent for conditional statements.
7. WARNING: braces {} are not necessary for single statement blocks.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b6efdb02

drm/amdgpu: handle denied inject error into critical regions v2 · 79c04621

由 Stanley.Yang 提交于 1月 11, 2022

Changed from v1:
    remove unused brace
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

79c04621

15 1月, 2022 18 次提交

drm/amdgpu: fix compile warning for ras_block_match_default · e3d833f4

由 yipechai 提交于 1月 13, 2022

fix compile warning for ras_block_match_default
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e3d833f4

drm/amdgpu: Use ARRAY_SIZE to get array length · 954ea6aa

由 yipechai 提交于 1月 13, 2022

Use ARRAY_SIZE to get array length.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

954ea6aa

drm/amdgpu: clean up some inconsistent indenting · ab3b9de6

由 Yang Li 提交于 1月 13, 2022

Eliminate the follow smatch warnings:
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3504 amdgpu_device_init()
warn: inconsistent indenting
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1716
amdgpu_ras_error_status_query() warn: if statement not indented
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1058 amdgpu_ras_error_inject()
warn: inconsistent indenting
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ab3b9de6

drm/amdgpu: remove unneeded semicolon · 69f91d32

由 Yang Li 提交于 1月 13, 2022

Eliminate the following coccicheck warning:
./drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2725:16-17: Unneeded semicolon
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

69f91d32

drm/amdgpu: No longer insert ras blocks into ras_list if it already exists in ras_list · df4f0041

由 yipechai 提交于 1月 12, 2022

No longer insert ras blocks into ras_list if it already exists in ras_list.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df4f0041

drm/amdgpu: Add ras supported check for register_ras_block · df01fe73

由 yipechai 提交于 1月 11, 2022

Add ras supported check for register_ras_block.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df01fe73

drm/amdgpu: Removed redundant ras code · 7389a5b8

由 yipechai 提交于 1月 05, 2022

Removed redundant ras code.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7389a5b8

drm/amdgpu: Adjust error inject function code style in amdgpu_ras.c · 22d4ba53

由 yipechai 提交于 1月 05, 2022

1. Move xgmi special error inject function from amdgpu_ras.c to xgmi block.
2. Support to use psp_ras_trigger_error as default error inject function in amdgpu_ras.c. If .ras_error_inject isn't defined in ras block, default error inject function will take effect.

v2: squash in warning fix (Alex)
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

22d4ba53

drm/amdgpu: Modify sdma block to fit for the unified ras block data and ops · bdc4292b

由 yipechai 提交于 1月 05, 2022

1.Modify sdma block to fit for the unified ras block data and ops.
2.Change amdgpu_sdma_ras_funcs to amdgpu_sdma_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of sdma ras variable so that sdma ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register sdma ras block into amdgpu device ras block link list.
5.Remove the redundant code about sdma in amdgpu_ras.c after using the unified ras block.
6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of sdma versions. If .ras_late_init and .ras_fini had been defined by the selected sdma version, the defined functions will take effect; if not defined, default fill them with amdgpu_sdma_ras_late_init and amdgpu_sdma_ras_fini.

bdc4292b

drm/amdgpu: Modify umc block to fit for the unified ras block data and ops · efe17d5a

由 yipechai 提交于 1月 06, 2022

1.Modify umc block to fit for the unified ras block data and ops.
2.Change amdgpu_umc_ras_funcs to amdgpu_umc_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of umc ras variable so that umc ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register umc ras block into amdgpu device ras block link list.
5.Remove the redundant code about umc in amdgpu_ras.c after using the unified ras block.
6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of umc versions. If .ras_late_init and .ras_fini had been defined by the selected umc version, the defined functions will take effect; if not defined, default fill them with amdgpu_umc_ras_late_init and amdgpu_umc_ras_fini.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

efe17d5a

drm/amdgpu: Modify nbio block to fit for the unified ras block data and ops · 2e54fe5d

由 yipechai 提交于 1月 05, 2022

1.Modify nbio block to fit for the unified ras block data and ops.
2.Change amdgpu_nbio_ras_funcs to amdgpu_nbio_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of mmhub ras variable so that nbio ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register nbio ras block into amdgpu device ras block link list.
5.Remove the redundant code about nbio in amdgpu_ras.c after using the unified ras block.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2e54fe5d

drm/amdgpu: Modify mmhub block to fit for the unified ras block data and ops · 5e67bba3

由 yipechai 提交于 1月 04, 2022

1.Modify mmhub block to fit for the unified ras block data and ops.
2.Change amdgpu_mmhub_ras_funcs to amdgpu_mmhub_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of mmhub ras variable so that mmhub ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register mmhub ras block into amdgpu device ras block link list. 5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block.
5.Remove the redundant code about mmhub in amdgpu_ras.c after using the unified ras block.
6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of mmhub versions. If .ras_late_init and .ras_fini had been defined by the selected mmhub version, the defined functions will take effect; if not defined, default fill them with amdgpu_mmhub_ras_late_init and amdgpu_mmhub_ras_fini.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5e67bba3

drm/amdgpu: Modify hdp block to fit for the unified ras block data and ops · 6d76e904

由 yipechai 提交于 1月 04, 2022

1.Modify hdp block to fit for the unified ras block data and ops.
2.Change amdgpu_hdp_ras_funcs to amdgpu_hdp_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of hdp ras variable so that hdp ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register hdp ras block into amdgpu device ras block link list.
5.Remove the redundant code about hdp in amdgpu_ras.c after using the unified ras block.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6d76e904

drm/amdgpu: Modify xgmi block to fit for the unified ras block data and ops · 6c245386

由 yipechai 提交于 1月 04, 2022

1.Modify gmc block to fit for the unified ras block data and ops.
2.Change amdgpu_xgmi_ras_funcs to amdgpu_xgmi_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of gmc ras variable so that gmc ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register gmc ras block into amdgpu device ras block link list.
5.Remove the redundant code about gmc in amdgpu_ras.c after using the unified ras block.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6c245386

drm/amdgpu: Modify gfx block to fit for the unified ras block data and ops · 8b0fb0e9

由 yipechai 提交于 1月 04, 2022

1.Modify gfx block to fit for the unified ras block data and ops.
2.Change amdgpu_gfx_ras_funcs to amdgpu_gfx_ras, and the corresponding variable name remove _funcs suffix.
3.Remove the const flag of gfx ras variable so that gfx ras block can be able to be inserted into amdgpu device ras block link list.
4.Invoke amdgpu_ras_register_ras_block function to register gfx ras block into amdgpu device ras block link list.
5.Remove the redundant code about gfx in amdgpu_ras.c after using the unified ras block.
6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of gfx versions. If .ras_late_init and .ras_fini had been defined by the selected gfx version, the defined functions will take effect; if not defined, default fill with amdgpu_gfx_ras_late_init and amdgpu_gfx_ras_fini.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8b0fb0e9

drm/amdgpu: Modify the compilation failed problem when other ras blocks' .h include amdgpu_ras.h · 7cab2124

由 yipechai 提交于 1月 04, 2022

Modify the compilation failed problem when other ras blocks' .h include amdgpu_ras.h.

v2: squash in forward declaration warning fix (Alex)
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7cab2124

drm/amdgpu: Unify ras block interface for each ras block · 6492e1b0

由 yipechai 提交于 1月 04, 2022

1. Define unified ops interface for each block.
2. Add ras_block_match function pointer in ops interface, each ras block can customize specail match function to identify itself.
3. Add amdgpu_ras_block_match_default new function. If a ras block doesn't define .ras_block_match, default execute amdgpu_ras_block_match_default to identify this ras block.
4. Define unified basic ras block data for each ras block.
5. Create dedicated amdgpu device ras block link list to manage all of the ras blocks.
6. Add amdgpu_ras_register_ras_block new function interface for each ras block to register itself to ras controlling block.
Signed-off-by: Nyipechai <YiPeng.Chai@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6492e1b0

drm/amd/pm: do not expose implementation details to other blocks out of power · bc143d8b

由 Evan Quan 提交于 11月 22, 2021

Those implementation details(whether swsmu supported, some ppt_funcs supported,
accessing internal statistics ...)should be kept internally. It's not a good
practice and even error prone to expose implementation details.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bc143d8b

12 1月, 2022 1 次提交

drm/amdgpu: do not pass ttm_resource_manager to vram_mgr · ec6aae97

由 Nirmoy Das 提交于 1月 07, 2022

Do not allow exported amdgpu_vram_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call vram_mgr functions.

v2: pass adev's vram_mgr instead of adev
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ec6aae97

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功