提交 · 63d4c081a556a1e1f200411ad1e34a51965f1048 · openeuler / Kernel

01 7月, 2021 6 次提交

drm/amdgpu: Optimize EEPROM RAS table I/O · 63d4c081

由 Luben Tuikov 提交于 4月 06, 2021

Split functionality between read and write, which
simplifies the code and exposes areas of
optimization and more or less complexity, and take
advantage of that.

Read and write the table in one go; use a separate
stage to decode or encode the data, as opposed to
on the fly, which keeps the I2C bus busy. Use a
single read/write to read/write the table or at
most two if the number of records we're
reading/writing wraps around.

Check the check-sum of a table in EEPROM on init.

Update the checksum at the same time as when
updating the table header signature, when the
threshold was increased on boot.

Take advantage of arithmetic modulo 256, that is,
use a byte!, to greatly simplify checksum
arithmetic.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

63d4c081

drm/amdgpu: Some renames · 0686627b

由 Luben Tuikov 提交于 6月 16, 2021

Qualify with "ras_". Use kernel's own--don't
redefine your own.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0686627b

drm/amdgpu: Use explicit cardinality for clarity · e4e6a589

由 Luben Tuikov 提交于 3月 27, 2021

RAS_MAX_RECORD_NUM may mean the maximum record
number, as in the maximum house number on your
street, or it may mean the maximum number of
records, as in the count of records, which is also
a number. To make this distinction whether the
number is ordinal (index) or cardinal (count),
rename this macro to RAS_MAX_RECORD_COUNT.

This makes it easy to understand what it refers
to, especially when we compute quantities such as,
how many records do we have left in the table,
especially when there are so many other numbers,
quantities and numerical macros around.

Also rename the long,
amdgpu_ras_eeprom_get_record_max_length() to the
more succinct and clear,
amdgpu_ras_eeprom_max_record_count().

When computing the threshold, which also deals
with counts, i.e. "how many", use cardinal
"max_eeprom_records_count", than the quantitative
"max_eeprom_records_len".

Simplify the logic here and there, as well.

Cc: Guchun Chen <guchun.chen@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e4e6a589

drm/amdgpu: Return result fix in RAS · cf696091

由 Luben Tuikov 提交于 3月 11, 2021

The low level EEPROM write method, doesn't return
1, but the number of bytes written. Thus do not
compare to 1, instead, compare to greater than 0
for success.

Other cleanup: if the lower layers returned
-errno, then return that, as opposed to
overwriting the error code with one-fits-all
-EINVAL. For instance, some return -EAGAIN.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cf696091

drm/amdgpu: RAS xfer to read/write · 1fab841f

由 Luben Tuikov 提交于 3月 11, 2021

Wrap amdgpu_ras_eeprom_xfer(..., bool write),
into amdgpu_ras_eeprom_read() and
amdgpu_ras_eeprom_write(), as that makes reading
and understanding the code clearer.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1fab841f

drm/amdgpu: Rename misspelled function · a4399657

由 Luben Tuikov 提交于 2月 05, 2021

Instead of fixing the spelling in
  amdgpu_ras_eeprom_process_recods(),
rename it to,
  amdgpu_ras_eeprom_xfer(),
to look similar to other I2C and protocol
transfer (read/write) functions.

Also to keep the column span to within reason by
using a shorter name.

Change the "num" function parameter from "int" to
"const u32" since it is the number of items
(records) to xfer, i.e. their count, which cannot
be a negative number.

Also swap the order of parameters, keeping the
pointer to records and their number next to each
other, while the direction now becomes the last
parameter.

Cc: Jean Delvare <jdelvare@suse.de>
Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4399657

19 6月, 2021 2 次提交

drm/amdgpu: message smu to update hbm bad page number · 513befa6

由 Stanley.Yang 提交于 6月 11, 2021

Use SMU to update the bad pages rather than directly
accessing the EEPROM from the driver.
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

513befa6

drm/amdgpu: add vega20 to ras quirk list · e11d5e0d

由 Stanley.Yang 提交于 6月 15, 2021

Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e11d5e0d

12 6月, 2021 1 次提交

drm/amdgpu: use adev_to_drm macro for consistency (v2) · a3fbb0d8

由 Guchun Chen 提交于 6月 09, 2021

Use adev_to_drm() to get to the drm_device pointer.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a3fbb0d8

28 5月, 2021 2 次提交

drm/amdgpu: Use delayed work to collect RAS error counters · 05adfd80

由 Luben Tuikov 提交于 5月 21, 2021

On Context Query2 IOCTL return the correctable and
uncorrectable errors in O(1) fashion, from cached
values, and schedule a delayed work function to
calculate and cache them for the next such IOCTL.

v2: Cancel pending delayed work at ras_fini().
v3: Remove conditionals when dealing with delayed
    work manipulation as they're inherently racy.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

05adfd80

drm/amdgpu: Fix RAS function interface · a46751fb

由 Luben Tuikov 提交于 5月 18, 2021

The correctable and uncorrectable errors
are calculated at each invocation of this
function. Therefore, it is highly inefficient to
return just one of them based on a Boolean
input. If the caller wants both, twice the work
would be done. (And this work is O(n^3) on
Vega20.)

Fix this "interface" to simply return what it had
calculated--both values. Let the caller choose
what it wants to record, inspect, use.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a46751fb

20 5月, 2021 2 次提交

drm/amdgpu: Split amdgpu_device_fini into early and late · 72c8c97b

由 Andrey Grodzovsky 提交于 5月 12, 2021

Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and
releasing driver data structures can be kept until
drm_driver.release hook is called, i.e. when the last device
reference is dropped.

v4: Change functions prefix early->hw and late->sw
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-3-andrey.grodzovsky@amd.com

72c8c97b

drm/amdgpu: Conditionally reset RAS counters on boot · 8f6368a9

由 John Clements 提交于 5月 17, 2021

Only clear RAS error counters if perestent EDC harvesting is not supported
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8f6368a9

11 5月, 2021 11 次提交

drm/amd/amdgpu: Fix errors in function documentation · c666bbf0

由 Dwaipayan Ray 提交于 5月 09, 2021

Fix a couple of syntax errors and removed one excess
parameter in the function documentations which lead
to kernel docs build warning.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDwaipayan Ray <dwaipayanray1@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c666bbf0

drm/amdgpu: add function to clear MMEA error status for aldebaran · 7780f503

由 Dennis Li 提交于 5月 10, 2021

For aldebaran, hardware will not clear error status automatically when
reading error status register, insteadly driver should set clear bit of
the error status register explicitly to clear error status.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7780f503

drm/amdgpu: covert ras status to kernel errno · 011907fd

由 Dennis Li 提交于 5月 10, 2021

The original codes use ras status and kernl errno together in the same
function, which is a wrong code style.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

011907fd

drm/amdgpu: Quit RAS initialization earlier if RAS is disabled · 7ddd9770

由 Oak Zeng 提交于 5月 06, 2021

If RAS is disabled through amdgpu_ras_enable kernel parameter,
we should quit the RAS initialization eariler to avoid initialization
of some RAS data structure such as sysfs etc.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7ddd9770

drm/amdgpu: Export ras_*_enabled to debugfs · ef0d7d20

由 Luben Tuikov 提交于 5月 04, 2021

Export the runtime-set "ras_hw_enabled" and
"ras_enabled" to debugfs, for debugging.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ef0d7d20

drm/amdgpu: Rename to ras_*_enabled · 8ab0d6f0

由 Luben Tuikov 提交于 5月 04, 2021

Rename,
  ras_hw_supported --> ras_hw_enabled, and
  ras_features     --> ras_enabled,
to show that ras_enabled is a subset of
ras_hw_enabled, which itself is a subset
of the ASIC capability.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8ab0d6f0

drm/amdgpu: Move up ras_hw_supported · e509965e

由 Luben Tuikov 提交于 5月 03, 2021

Move ras_hw_supported into struct amdgpu_dev.
The dependency is:
struct amdgpu_ras <== struct amdgpu_dev <== ASIC,
read as "struct amdgpu_ras depends on struct
amdgpu_dev, which depends on the hardware."

This can be loosely understood as, "if RAS is
supported, which is property of the ASIC (struct
amdgpu_dev), then we can access struct
amdgpu_ras."

v2: Fix a typo: must binary AND in ternary cond
    in amdgpu_ras.c

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e509965e

drm/amdgpu: Remove redundant ras->supported · acdae216

由 Luben Tuikov 提交于 5月 03, 2021

Remove redundant ras->supported, as this value
is also stored in adev->ras_features.

Use adev->ras_features, as that supercedes "ras",
since the latter is its member.

The dependency goes like this:
ras <== adev->ras_features <== hw_supported,
and is read as "ras depends on ras_features, which
depends on hw_supported." The arrows show the flow
of information, i.e. the dependency update.

"hw_supported" should also live in "adev".

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

acdae216

drm/amdgpu: force enable gfx ras for vega20 ws · f50160cf

由 Stanley.Yang 提交于 4月 29, 2021

Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f50160cf

drm/amdgpu: enable gfx ras in aldebran by default · 1f6e8eb1

由 Hawking Zhang 提交于 4月 29, 2021

gfx ras now can be enabled by default in aldebaran
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1f6e8eb1

drm/amdgpu: enable ras error count query and reset for HDP · 78871b6c

由 Hawking Zhang 提交于 4月 28, 2021

add hdp block ras error query and reset support in
amdgpu ras error count query and reset interface
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

78871b6c

29 4月, 2021 1 次提交

drm/amdgpu: provide socket/die id info in RAS message · a30f1286

由 Hawking Zhang 提交于 4月 25, 2021

Add socket/die information in RAS messages for platforms
that support query those information
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a30f1286

24 4月, 2021 2 次提交

drm/amdgpu: disable gfx ras by default in aldebaran · 502f0e28

由 Hawking Zhang 提交于 4月 22, 2021

aldebaran gfx ras is still under development
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

502f0e28

drm/amdgpu: optimize gfx ras features flag clean · 19d0dfda

由 Stanley.Yang 提交于 4月 19, 2021

Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

19d0dfda

21 4月, 2021 2 次提交

drm/amdgpu: Reset RAS error count and status regs · 1f0d8e37

由 Mukul Joshi 提交于 3月 24, 2021

Reset the RAS error count and error status registers after
reading to prevent over reporting error counts on Aldebaran.
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Reviewed-By: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1f0d8e37

drm/amdgpu: fix a error injection failed issue · 6df23f4c

由 Dennis Li 提交于 4月 16, 2021

because "sscanf(str, "retire_page")" always return 0, if application use
the raw data for error injection, it always wrongly falls into "op ==
3". Change to use strstr instead.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6df23f4c

16 4月, 2021 5 次提交

drm/amdgpu: Add double-sscanf but invert · 546aa546

由 Luben Tuikov 提交于 4月 14, 2021

Add back the double-sscanf so that both decimal
and hexadecimal values could be read in, but this
time invert the scan so that hexadecimal format
with a leading 0x is tried first, and if that
fails, then try decimal format.

Also use a logical-AND instead of nesting double
if-conditional.

See commit "drm/amdgpu: Fix a bug for input with double sscanf"

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

546aa546

drm/amdgpu: Fix kernel-doc for the RAS sysfs interface · 737c375b

由 Luben Tuikov 提交于 4月 13, 2021

Imporve the kernel-doc for the RAS sysfs
interface. Fix the grammar, fix the context.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

737c375b

drm/amdgpu: Add bad_page_cnt_threshold to debugfs · 7fb64071

由 Luben Tuikov 提交于 4月 13, 2021

Add bad_page_cnt_threshold to debugfs, an optional
file system used for debugging, for reporting
purposes only--it usually matches the size of
EEPROM but may be different depending on the
"bad_page_threshold" kernel module option.

The "bad_page_cnt_threshold" is a dynamically
computed value. It depends on three things: the
VRAM size; the size of the EEPROM (or the size
allocated to the RAS table therein); and the
"bad_page_threshold" module parameter. It is a
dynamically computed value, when the amdgpu module
is run, on which further parameters and logic
depend, and as such it is helpful to see the
dynamically computed value in debugfs.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7fb64071

drm/amdgpu: Fix a bug in checking the result of reserve page · 80b0cd0f

由 Luben Tuikov 提交于 4月 13, 2021

Fix if (ret) --> if (!ret), a bug, for
"retire_page", which caused the kernel to recall
the method with *pos == end of file, and that
bounced back with error. On the first run, we
advanced *pos, but returned 0 back to fs layer,
also a bug.

Fix the logic of the check of the result of
amdgpu_reserve_page_direct()--it is 0 on success,
and non-zero on error, not the other way
around. This patch fixes this bug.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

80b0cd0f

drm/amdgpu: Fix a bug for input with double sscanf · 6cb7a1d4

由 Luben Tuikov 提交于 4月 13, 2021

Remove double-sscanf to scan for %llu and 0x%llx,
as that is not going to work!

The %llu will consume the "0" in "0x" of your
input, and the hex value you think you're entering
will always be 0. That is, a valid hex value can
never be consumed.

On the other hand, just entering a hex number
without leading 0x will either be scanned as a
string and not match, for instance FAB123, or
the leading decimal portion is scanned as the
%llu, for instance 123FAB will be scanned as 123,
which is not correct.

Thus remove the first %llu scan and leave only the
%llx scan, removing the leading 0x since %llx can
scan either.

Addresses are usually always hex values, so this
suffices.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Xinhui Pan <xinhui.pan@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6cb7a1d4

10 4月, 2021 6 次提交

drm/amdgpu: page retire over debugfs mechanism · cbb8f989

由 John Clements 提交于 4月 09, 2021

added support in RAS debugfs to add bad page for isolated page retirement testing
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cbb8f989

drm/amdgpu: RAS harvest on driver load · 134d16d5

由 John Clements 提交于 3月 25, 2021

In event of RAS UE + warm reset, error counters shall be harvested and cleared on driver load
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

134d16d5

drm/amdgpu: split gfx callbacks into ras and non-ras ones · 719a9b33

由 Hawking Zhang 提交于 3月 19, 2021

gfx ras is only available in cerntain ip generations.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

719a9b33

drm/amdgpu: split mmhub callbacks into ras and non-ras ones · 8bc7b360

由 Hawking Zhang 提交于 3月 19, 2021

mmhub ras is only avaiable in cerntain mmhub ip
generation.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8bc7b360

drm/amdgpu: split umc callbacks to ras and non-ras ones · 49070c4e

由 Hawking Zhang 提交于 3月 17, 2021

umc ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. split umc callbacks
into ras and non-ras ones so gpu driver only
initializes umc ras callbacks when it manages
umc ras.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

49070c4e

drm/amdgpu: move xgmi ras functions to xgmi_ras_funcs · 52137ca8

由 Hawking Zhang 提交于 3月 18, 2021

xgmi ras is not managed by gpu driver when gpu is
connected to cpu through xgmi. move all xgmi ras
functions to xgmi_ras_funcs so gpu driver only
initializes xgmi ras functions when it manages
xgmi ras.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52137ca8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功