提交 · 158225294683310566445f8477336e747b74f03f · openeuler / Kernel

“5ad5a8199e3b3e1162af50c0a52eca678a259d67”上不存在“deploy/pptracking/python/mot_jde_infer.py”

14 9月, 2022 1 次提交

drm/amdgpu: Add EEPROM I2C address for smu v13_0_0 · 15822529

由 Candice Li 提交于 9月 07, 2022

Set correct EEPROM I2C address for smu v13_0_0.
Signed-off-by: NCandice Li <candice.li@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

15822529

16 3月, 2022 1 次提交

drm/amdgpu: message smu to update bad channel info · 69691c82

由 Stanley.Yang 提交于 3月 03, 2022

It should notice SMU to update bad channel info when detected
uncorrectable error in UMC block
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

69691c82

12 2月, 2022 1 次提交

drm/amdgpu: Reset OOB table error count info · 8bbd4d83

由 Stanley.Yang 提交于 2月 11, 2022

The OOB table error count info should be reset after reset
eeprom table
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8bbd4d83

10 2月, 2022 1 次提交

drm/amdgpu: Move reset sem into reset_domain · d0fb18b5

由 Andrey Grodzovsky 提交于 1月 19, 2022

We want single instance of reset sem across all
reset clients because in case of XGMI we should stop
access cross device MMIO because any of them could be
in a reset in the moment.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74117.html

d0fb18b5

28 1月, 2022 1 次提交

drm/amd: Expose the FRU SMU I2C bus · 2f60dd50

由 Luben Tuikov 提交于 1月 19, 2022

Expose both SMU I2C buses. Some boards use the same bus for both the RAS
and FRU EEPROMs and others use different buses.  This enables the
additional I2C bus and sets the right buses to use for RAS and FRU EEPROM
access.

Cc: Roy Sun <Roy.Sun@amd.com>
Co-developed-by: NAlex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2f60dd50

29 10月, 2021 2 次提交

drm/amdgpu: Add kernel parameter support for ignoring bad page threshold · 68daadf3

由 Kent Russell 提交于 10月 19, 2021

When a GPU hits the bad_page_threshold, it will not be initialized by
the amdgpu driver. This means that the table cannot be cleared, nor can
information gathering be performed (getting serial number, BDF, etc).

If the bad_page_threshold kernel parameter is set to -2,
continue to initialize the GPU, while printing a warning to dmesg that
this action has been done

v2: squash in Luben's fix to restore RAS info reporting

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: NKent Russell <kent.russell@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

68daadf3

drm/amdgpu: Warn when bad pages approaches 90% threshold · 8483fdfe

由 Kent Russell 提交于 10月 12, 2021

dmesg doesn't warn when the number of bad pages approaches the
threshold for page retirement. WARN when the number of bad pages
is at 90% or greater for easier checks and planning, instead of waiting
until the GPU is full of bad pages.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8483fdfe

20 10月, 2021 1 次提交

drm/amdgpu: Clarify error when hitting bad page threshold · dcd5ea9f

由 Kent Russell 提交于 10月 19, 2021

Change the error message when the bad_page_threshold is reached,
explicitly stating that the GPU will not be initialized.

Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Mukul Joshi <Mukul.Joshi@amd.com>
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dcd5ea9f

24 9月, 2021 1 次提交

drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count · 6cd1f9b4

由 Michel Dänzer 提交于 9月 09, 2021

This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation
units. The latter would be more involved in this case, so just drop
the inline declaration for now.

Fixes compile failure building for ppc64le on RHEL 8:

In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
                 from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call
 to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available
   90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
 1985 |         max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count();
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: c84d4670 "drm/amdgpu: validate bad page threshold in ras(v3)"
Reviewed-by: NLyude Paul <lyude@redhat.com>
Signed-off-by: NMichel Dänzer <mdaenzer@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6cd1f9b4

16 9月, 2021 1 次提交

drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count · 114518ff

由 Michel Dänzer 提交于 9月 09, 2021

This was unusual; normally, inline functions are declared static as
well, and defined in a header file if used by multiple compilation
units. The latter would be more involved in this case, so just drop
the inline declaration for now.

Fixes compile failure building for ppc64le on RHEL 8:

In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32,
                 from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’:
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call
 to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available
   90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here
 1985 |         max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count();
      |                                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: c84d4670 "drm/amdgpu: validate bad page threshold in ras(v3)"
Reviewed-by: NLyude Paul <lyude@redhat.com>
Signed-off-by: NMichel Dänzer <mdaenzer@redhat.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

114518ff

31 8月, 2021 1 次提交

drm/amdgpu: Process any VBIOS RAS EEPROM address · cc947bf9

由 Luben Tuikov 提交于 8月 25, 2021

We can now process any RAS EEPROM address from
VBIOS. Generalize so as to compute the top three
bits of the 19-bit EEPROM address, from any byte
returned as the "i2c address" from VBIOS.

Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cc947bf9

06 8月, 2021 2 次提交

drm/amdgpu: set RAS EEPROM address from VBIOS · 39932ef7

由 John Clements 提交于 8月 04, 2021

update to latest atombios fw table

[Backport to 5.14 - Alex]

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1670Signed-off-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

39932ef7

drm/amdgpu: set RAS EEPROM address from VBIOS · 14fb496a

由 John Clements 提交于 8月 04, 2021

update to latest atombios fw table
Signed-off-by: NJohn Clements <john.clements@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

14fb496a

09 7月, 2021 3 次提交

drm/amdgpu: return -EFAULT if copy_to_user() fails · 64598e23

由 Dan Carpenter 提交于 7月 03, 2021

If copy_to_user() fails then this should return -EFAULT instead of
-EINVAL.

Fixes: c65b0805 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

64598e23

drm/amdgpu: unlock on error in amdgpu_ras_debugfs_table_read() · b8badd50

由 Dan Carpenter 提交于 7月 03, 2021

This error path needs to unlock before returning.  While we're at it,
the correct error code from copy_to_user() failure is -EFAULT, not
-EINVAL.

Fixes: c65b0805 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b8badd50

drm/amdgpu: fix a signedness bug in __verify_ras_table_checksum() · 3006c924

由 Dan Carpenter 提交于 7月 03, 2021

If amdgpu_eeprom_read() returns a negative error code then the error
handling checks:

	if (res < buf_size) {

The problem is that "buf_size" is a u32 so negative values are type
promoted to a high positive values and the condition is false.  Fix
this by changing the type of "buf_size" to int.

Fixes: 63d4c081 ("drm/amdgpu: Optimize EEPROM RAS table I/O")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3006c924

01 7月, 2021 18 次提交

drm/amdgpu: fix 64 bit divide in eeprom code · d456f387

由 Alex Deucher 提交于 6月 30, 2021

pos is 64 bits.

Fixes: c65b0805 ("drm/amdgpu: RAS EEPROM table is now in debugfs")
Cc: luben.tuikov@amd.com
Reported-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d456f387

drm/amdgpu: RAS EEPROM table is now in debugfs · c65b0805

由 Luben Tuikov 提交于 4月 08, 2021

Add "ras_eeprom_size" file in debugfs, which
reports the maximum size allocated to the RAS
table in EEROM, as the number of bytes and the
number of records it could store. For instance,

$cat /sys/kernel/debug/dri/0/ras/ras_eeprom_size
262144 bytes or 10921 records
$_

Add "ras_eeprom_table" file in debugfs, which
dumps the RAS table stored EEPROM, in a formatted
way. For instance,

$cat ras_eeprom_table
 Signature    Version  FirstOffs       Size   Checksum
0x414D4452 0x00010000 0x00000014 0x000000EC 0x000000DA
Index  Offset ErrType Bank/CU          TimeStamp      Offs/Addr MemChl MCUMCID    RetiredPage
    0 0x00014      ue    0x00 0x00000000607608DC 0x000000000000   0x00    0x00 0x000000000000
    1 0x0002C      ue    0x00 0x00000000607608DC 0x000000001000   0x00    0x00 0x000000000001
    2 0x00044      ue    0x00 0x00000000607608DC 0x000000002000   0x00    0x00 0x000000000002
    3 0x0005C      ue    0x00 0x00000000607608DC 0x000000003000   0x00    0x00 0x000000000003
    4 0x00074      ue    0x00 0x00000000607608DC 0x000000004000   0x00    0x00 0x000000000004
    5 0x0008C      ue    0x00 0x00000000607608DC 0x000000005000   0x00    0x00 0x000000000005
    6 0x000A4      ue    0x00 0x00000000607608DC 0x000000006000   0x00    0x00 0x000000000006
    7 0x000BC      ue    0x00 0x00000000607608DC 0x000000007000   0x00    0x00 0x000000000007
    8 0x000D4      ue    0x00 0x00000000607608DD 0x000000008000   0x00    0x00 0x000000000008
$_

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Xinhui Pan <xinhui.pan@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c65b0805

drm/amdgpu: Optimize EEPROM RAS table I/O · 63d4c081

由 Luben Tuikov 提交于 4月 06, 2021

Split functionality between read and write, which
simplifies the code and exposes areas of
optimization and more or less complexity, and take
advantage of that.

Read and write the table in one go; use a separate
stage to decode or encode the data, as opposed to
on the fly, which keeps the I2C bus busy. Use a
single read/write to read/write the table or at
most two if the number of records we're
reading/writing wraps around.

Check the check-sum of a table in EEPROM on init.

Update the checksum at the same time as when
updating the table header signature, when the
threshold was increased on boot.

Take advantage of arithmetic modulo 256, that is,
use a byte!, to greatly simplify checksum
arithmetic.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

63d4c081

drm/amdgpu: Get rid of test function · 017dad64

由 Luben Tuikov 提交于 6月 16, 2021

The code is now tested from userspace.
Remove already macroed out test function.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

017dad64

drm/amdgpu: Some renames · 0686627b

由 Luben Tuikov 提交于 6月 16, 2021

Qualify with "ras_". Use kernel's own--don't
redefine your own.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0686627b

drm/amdgpu: Nerf buff · d7edde3d

由 Luben Tuikov 提交于 6月 16, 2021

buff --> buf. Essentially buffer abbreviates to
buf, remove 1/2 of it, or just the iron part, as
opposed to just the Er,

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d7edde3d

drm/amdgpu: Use explicit cardinality for clarity · e4e6a589

由 Luben Tuikov 提交于 3月 27, 2021

RAS_MAX_RECORD_NUM may mean the maximum record
number, as in the maximum house number on your
street, or it may mean the maximum number of
records, as in the count of records, which is also
a number. To make this distinction whether the
number is ordinal (index) or cardinal (count),
rename this macro to RAS_MAX_RECORD_COUNT.

This makes it easy to understand what it refers
to, especially when we compute quantities such as,
how many records do we have left in the table,
especially when there are so many other numbers,
quantities and numerical macros around.

Also rename the long,
amdgpu_ras_eeprom_get_record_max_length() to the
more succinct and clear,
amdgpu_ras_eeprom_max_record_count().

When computing the threshold, which also deals
with counts, i.e. "how many", use cardinal
"max_eeprom_records_count", than the quantitative
"max_eeprom_records_len".

Simplify the logic here and there, as well.

Cc: Guchun Chen <guchun.chen@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e4e6a589

drm/amdgpu: Simplify RAS EEPROM checksum calculations · 803c6ebd

由 Luben Tuikov 提交于 3月 26, 2021

Rename update_table_header() to
write_table_header() as this function is actually
writing it to EEPROM.

Use kernel types; use u8 to carry around the
checksum, in order to take advantage of arithmetic
modulo 8-bits (256).

Tidy up to 80 columns.

When updating the checksum, just recalculate the
whole thing.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

803c6ebd

drm/amdgpu: Fix amdgpu_ras_eeprom_init() · dce4400e

由 Luben Tuikov 提交于 3月 26, 2021

No need to account for the 2 bytes of EEPROM
address--this is now well abstracted away by
the fixes the the lower layers.

Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dce4400e

drm/amdgpu: Return result fix in RAS · cf696091

由 Luben Tuikov 提交于 3月 11, 2021

The low level EEPROM write method, doesn't return
1, but the number of bytes written. Thus do not
compare to 1, instead, compare to greater than 0
for success.

Other cleanup: if the lower layers returned
-errno, then return that, as opposed to
overwriting the error code with one-fits-all
-EINVAL. For instance, some return -EAGAIN.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cf696091

drm/amdgpu: EEPROM: add explicit read and write · 16ef7977

由 Luben Tuikov 提交于 3月 11, 2021

Add explicit amdgpu_eeprom_read() and
amdgpu_eeprom_write() for clarity.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

16ef7977

drm/amdgpu: RAS xfer to read/write · 1fab841f

由 Luben Tuikov 提交于 3月 11, 2021

Wrap amdgpu_ras_eeprom_xfer(..., bool write),
into amdgpu_ras_eeprom_read() and
amdgpu_ras_eeprom_write(), as that makes reading
and understanding the code clearer.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1fab841f

drm/amdgpu: Rename misspelled function · a4399657

由 Luben Tuikov 提交于 2月 05, 2021

Instead of fixing the spelling in
  amdgpu_ras_eeprom_process_recods(),
rename it to,
  amdgpu_ras_eeprom_xfer(),
to look similar to other I2C and protocol
transfer (read/write) functions.

Also to keep the column span to within reason by
using a shorter name.

Change the "num" function parameter from "int" to
"const u32" since it is the number of items
(records) to xfer, i.e. their count, which cannot
be a negative number.

Also swap the order of parameters, keeping the
pointer to records and their number next to each
other, while the direction now becomes the last
parameter.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4399657

drm/amdgpu: RAS: EEPROM --> RAS · c28aa44d

由 Luben Tuikov 提交于 2月 04, 2021

In amdgpu_ras_eeprom.c--the interface from RAS to
EEPROM, rename macros from EEPROM to RAS, to
indicate that the quantities and objects are RAS
specific, not EEPROM. We can decrease the RAS
table, or put it in different offset of EEPROM as
needed in the future.

Remove EEPROM_ADDRESS_SIZE macro definition, equal
to 2, from the file and calculations, as that
quantity is computed and added on the stack,
in the lower layer, amdgpu_eeprom_xfer().

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c28aa44d

drm/amdgpu: Fix wrap-around bugs in RAS · edb63a53

由 Luben Tuikov 提交于 2月 02, 2021

Fix the size of the EEPROM from 256000 bytes
to 262144 bytes (256 KiB).

Fix a couple or wrap around bugs. If a valid
value/address is 0 <= addr < size, the inverse of
this inequality (barring negative values which
make no sense here) is addr >= size. Fix this in
the RAS code.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

edb63a53

drm/amdgpu: RAS and FRU now use 19-bit I2C address · ccdfbfec

由 Luben Tuikov 提交于 2月 02, 2021

Convert RAS and FRU code to use the 19-bit I2C
memory address and remove all "slave_addr", as
this is now absolved into the 19-bit address.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: John Clements <john.clements@amd.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ccdfbfec

drm/amdgpu: i2c subsystem uses 7 bit addresses · 39ed82d1

由 Alex Deucher 提交于 1月 21, 2021

Convert from 8 bit to 7 bit.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>

39ed82d1

drm/amdgpu/ras: switch ras eeprom handling to use generic helper · 24f55c05

由 Alex Deucher 提交于 1月 21, 2021

Use the new helper rather than doing i2c transfers directly.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>

24f55c05

10 4月, 2021 1 次提交

drm/amdgpu: enable ras eeprom on aldebaran · 52a9df81

由 John Clements 提交于 4月 08, 2021

enable ras eeprom loading by default on aldebaran
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52a9df81

24 3月, 2021 2 次提交

drm/amdgpu: fix send ras disable cmd when asic not support ras · 970fd197

由 Stanley.Yang 提交于 3月 10, 2021

    cause:
	It is necessary to send ras disable command to ras-ta during gfx
	block ras later init, because the ras capability is disable read
	from vbios for vega20 gaming, but the ras context is released
	during ras init process, this will cause send ras disable command
	to ras-to failed.
    how:
	Delay releasing ras context, the ras context
	will be released after gfx block later init done.

Changed from V1:
    move release_ras_context into ras_resume

Changed from V2:
    check BIT(UMC) is more reasonable before access eeprom table
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

970fd197

drm/amdgpu: Reset the devices in the XGMI hive duirng probe · e3c1b071

由 shaoyunl 提交于 2月 16, 2021

In passthrough configuration, hypervisior will trigger the SBR(Secondary bus reset) to the devices
without sync to each other. This could cause device hang since for XGMI configuration, all the devices
within the hive need to be reset at a limit time slot. This serial of patches try to solve this issue
by co-operate with new SMU which will only do minimum house keeping to response the SBR request but don't
do the real reset job and leave it to driver. Driver need to do the whole sw init and minimum HW init
to bring up the SMU and trigger the reset(possibly BACO) on all the ASICs at the same time
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Acked-by: Andrey Grodzovsky andrey.grodzovsky@amd.com
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e3c1b071

27 2月, 2021 1 次提交

drm/amdgpu: remove unnecessary reading for epprom header · 11003c68

由 Dennis Li 提交于 2月 26, 2021

If the number of badpage records exceed the threshold, driver has
updated both epprom header and control->tbl_hdr.header before gpu reset,
therefore GPU recovery thread no need to read epprom header directly.

v2: merge amdgpu_ras_check_err_threshold into amdgpu_ras_eeprom_check_err_threshold
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

11003c68

07 1月, 2021 1 次提交

drm/amdgpu: enable ras eeprom support for sienna cichlid · 3851c90b

由 John Clements 提交于 1月 05, 2021

added I2C address and asic support flag
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3851c90b

06 1月, 2021 1 次提交

drm/amdgpu: enable ras eeprom support for sienna cichlid · 3e7bc83e

由 John Clements 提交于 1月 05, 2021

added I2C address and asic support flag
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3e7bc83e

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功