提交 · 9ca0674a71a5112fa9931d8f5fbe84cac28765a2 · openeuler / Kernel

06 1月, 2021 1 次提交

drm/amdgpu: remove redundant logic related HDP · 9ca0674a

由 Likun Gao 提交于 12月 28, 2020

Remove hdp_flush function from amdgpu_nbio struct as it have been unified
into hdp struct.
Remove the include about hdp register which was not used.
V2: Remove hdp golden setting which is unnecessary.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9ca0674a

15 8月, 2020 1 次提交

drm/amdgpu: bypass querying ras error count registers · f75e94d8

由 Guchun Chen 提交于 8月 04, 2020

Once ras recovery is issued by ras sync flood interrupt or
ras controller interrupt, add this guard to bypass or execute
ras error count register harvest of all IPs.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f75e94d8

14 4月, 2020 1 次提交

drm/amdgpu: refine ras related message print · 6952e99c

由 Guchun Chen 提交于 4月 10, 2020

Prefix ras related kernel message logging with PCI
device info by replacing DRM_INFO/WARN/ERROR with
dev_info/warn/err. This can clearly tell user about
GPU device information where ras is. And add some
other ras message printing to make it more clear
and friendly as well.
Suggested-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6952e99c

02 4月, 2020 2 次提交

drm/amdgpu: ih doorbell size of range changed for nbio v7.4 · b635ae87

由 Alex Sierra 提交于 3月 18, 2020

[Why]
nbio v7.4 size of ih doorbell range is 64 bit. This requires 2 DWords per register.

[How]
Change ih doorbell size from 2 to 4. This means two Dwords per ring.
Current configuration uses two ih rings.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b635ae87

drm/amdgpu: cleanup all virtualization detection routine · 3aa0115d

由 Monk Liu 提交于 3月 04, 2020

we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
0x1503

2) we need to acknowledged we are SRIOV VF before we do IP discovery because
the IP discovery content will be updated by host everytime after it recieved
a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
for this new handshake soon).
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3aa0115d

19 2月, 2020 1 次提交

drm/amdgpu: record non-zero error counter info in NBIO before resetting GPU · 3cd4f618

由 Guchun Chen 提交于 2月 13, 2020

When NBIO's RAS error happens, before trigging GPU reset, it's needed
to record error counter information, which can correct the error counter
value missed issue when reading from debugfs.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3cd4f618

08 1月, 2020 1 次提交

drm/amdgpu: simplify function return logic · 8831fa6e

由 Guchun Chen 提交于 12月 24, 2019

Former return logic is redundant.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NLe Ma <Le.Ma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8831fa6e

19 12月, 2019 1 次提交

drm/amdgpu: drop useless BACO arg in amdgpu_ras_reset_gpu · 61934624

由 Guchun Chen 提交于 12月 13, 2019

BACO reset mode strategy is determined by latter func when
calling amdgpu_ras_reset_gpu. So not to confuse audience, drop
it.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

61934624

06 12月, 2019 3 次提交

drm/amdgpu: clear uncorrectable parity error status bit · 5c39d600

由 Le Ma 提交于 11月 22, 2019

This should be cleared during every nbif uncorrectable error cleanup work.
Signed-off-by: NLe Ma <le.ma@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5c39d600

drm/amdgpu: clear ras controller status registers when interrupt occurs · 28f87950

由 Le Ma 提交于 11月 22, 2019

To fix issue that ras controller interrupt cannot be triggered anymore after
one time nbif uncorrectable error. And error count is stored in nbif ras object
for query.
Signed-off-by: NLe Ma <le.ma@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

28f87950

drm/amdgpu: remove ras global recovery handling from ras_controller_int handler · 4a2d9356

由 Le Ma 提交于 10月 22, 2019

v2: add notification when ras controller interrupt generates
Signed-off-by: NLe Ma <Le.Ma@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a2d9356

16 10月, 2019 1 次提交

drm/amdgpu/soc15: disable doorbell interrupt as part of BACO entry sequence · 956f6705

由 Le Ma 提交于 10月 11, 2019

Workaround to make RAS recovery work in BACO reset.
Signed-off-by: NLe Ma <le.ma@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

956f6705

16 9月, 2019 4 次提交

Revert "drm/amdgpu/nbio7.4: add hw bug workaround for vega20" · 3e103fc3

由 Kent Russell 提交于 9月 10, 2019

This reverts commit e01f2d41.

VG20 did not require this workaround, as the fix is in the VBIOS.
Leave VG10/12 workaround as some older shipped cards do not have the
VBIOS fix in place, and the kernel workaround is required in those
situations
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3e103fc3

drm/amdgpu: implement ras query function for pcie bif · 1a3f2e8c

由 Guchun Chen 提交于 9月 11, 2019

ras error query funtionality implementation
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1a3f2e8c

drm/amdgpu: add ras error query count interface for nbio · 52652ef2

由 Guchun Chen 提交于 9月 04, 2019

Add the interface query_ras_error_count for nbio.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52652ef2

drm/amdgpu: remove duplicated header file include · 1bd252c5

由 Guchun Chen 提交于 9月 10, 2019

amdgpu_ras.h is already included.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1bd252c5

14 9月, 2019 7 次提交

drm/amdgpu/nbio: switch to amdgpu_nbio_ras_late_init helper function · 1c70d3d9

由 Hawking Zhang 提交于 9月 03, 2019

amdgpu_nbio_ras_late_init is used to init nbio specfic
ras debugfs/sysfs node and nbio specific interrupt handler.
It can be shared among nbio generations
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1c70d3d9

drm/amdgpu: set ip specific ras interface pointer to NULL after free it · d094aea3

由 Hawking Zhang 提交于 9月 03, 2019

to prevent access to dangling pointers
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d094aea3

drm/amdgpu: Avoid HW GPU reset for RAS. · 7c6e68c7

由 Andrey Grodzovsky 提交于 9月 13, 2019

Problem:
Under certain conditions, when some IP bocks take a RAS error,
we can get into a situation where a GPU reset is not possible
due to issues in RAS in SMU/PSP.

Temporary fix until proper solution in PSP/SMU is ready:
When uncorrectable error happens the DF will unconditionally
broadcast error event packets to all its clients/slave upon
receiving fatal error event and freeze all its outbound queues,
err_event_athub interrupt  will be triggered.
In such case and we use this interrupt
to issue GPU reset. THe GPU reset code is modified for such case to avoid HW
reset, only stops schedulers, deatches all in progress and not yet scheduled
job's fences, set error code on them and signals.
Also reject any new incoming job submissions from user space.
All this is done to notify the applications of the problem.

v2:
Extract amdgpu_amdkfd_pre/post_reset from amdgpu_device_lock/unlock_adev
Move amdgpu_job_stop_all_jobs_on_sched to amdgpu_job.c
Remove print param from amdgpu_ras_query_error_count

v3:
Update based on prevoius bug fixing patch to properly call amdgpu_amdkfd_pre_reset
for other XGMI hive memebers.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7c6e68c7

drm/amdgpu: add ras_late_init callback function for nbio v7_4 (v3) · 9ad1dc29

由 Hawking Zhang 提交于 8月 29, 2019

ras_late_init callback function will be used to do common ras
init in late init phase.

v2: call ras_late_fini to do cleanup when fails to enable interrupt

v3: rename sysfs/debugfs node name to pcie_bif_xxx
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9ad1dc29

drm/amdgpu: add ras_controller and err_event_athub interrupt support · 4e644fff

由 Hawking Zhang 提交于 6月 05, 2019

Ras controller interrupt and Ras err event athub interrupt are two dedicated
interrupts for RAS support.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4e644fff

drm/amdgpu/nbio: add functions to query ras specific interrupt status · 4241863a

由 Hawking Zhang 提交于 5月 30, 2019

ras_controller_interrupt and err_event_interrupt are ras specific interrupts.
add functions to check their status and ack them if they are generated. both
funcitons should only be invoked in ISR when BIF ring is disabled or even not
initialized.
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4241863a

drm/amdgpu: switch to new amdgpu_nbio structure · bebc0762

由 Hawking Zhang 提交于 8月 23, 2019

no functional change, just switch to new structures
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bebc0762

19 7月, 2019 4 次提交

drm/amdgpu: add vcn nbio doorbell range setting for 2nd vcn instance · 989b6a05

由 James Zhu 提交于 7月 10, 2019

add vcn nbio doorbell range setting for 2nd vcn instance
Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NLeo Liu <leo.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

989b6a05

drm/amdgpu: add vcn doorbell range function to nbio7.4 (v2) · 39a5053f

由 Leo Liu 提交于 7月 09, 2019

To setup the aperture for VCN2.5

v2: setup vcn doorbells in vcn2.5 hw_init (Alex)
Signed-off-by: NLeo Liu <leo.liu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

39a5053f

drm/amdgpu: support sdma 2~7 doorbell range register offset · 3d81f67a

由 Le Ma 提交于 9月 19, 2018

Update the doorbell range registers to support additional
SDMA rings.
Signed-off-by: NLe Ma <le.ma@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3d81f67a

drm/amdgpu: support hdp flush for more sdma instances · 0fe6a7b4

由 Le Ma 提交于 9月 10, 2018

The bit RSVD_ENG0 to RSVD_ENG5 in GPU_HDP_FLUSH_REQ/GPU_HDP_FLUSH_DONE
can be leveraged for sdma instance 2~7 to poll register/memory.
Signed-off-by: NLe Ma <le.ma@amd.com>
Acked-by: Snow Zhang < Snow.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0fe6a7b4

25 5月, 2019 1 次提交

drm/amdgpu: Remap hdp coherency registers · 88807dc8

由 Oak Zeng 提交于 4月 04, 2019

Remap HDP_MEM_COHERENCY_FLUSH_CNTL and HDP_REG_COHERENCY_FLUSH_CNTL
to an empty page in mmio space. We will later map this page to process
space so application can flush hdp. This can't be done properly at
those registers' original location because it will expose more than
desired registers to process space.

v2: Use explicit register hole location
v3: Moved remapped hdp registers into adev struct
v4: Use more generic name for remapped page
    Expose register offset in kfd_ioctl.h
v5: Move hdp register remap function to nbio ip function
v6: Fixed operator precedence issue and other bugs
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

88807dc8

14 2月, 2019 1 次提交

drm/amdgpu: fix several indentation issues · 9b49c197

由 Colin Ian King 提交于 2月 12, 2019

There are several statements that are incorrectly indented. Fix these.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9b49c197

01 2月, 2019 1 次提交

drm/amdgpu: Implement doorbell self-ring for NBIO 7.4 · 12292519

由 Jay Cornwall 提交于 1月 30, 2019

Fixes doorbell reflection on Vega20.

Change-Id: I0495139d160a9032dff5977289b1eec11c16f781
Signed-off-by: NJay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

12292519

26 1月, 2019 1 次提交

drm/amdgpu: Fix sdma doorbell range setting · 8987e2e2

由 Oak Zeng 提交于 12月 17, 2018

Different ASIC has different SDMA queue number so
different SDMA doorbell range. Introduce an extra
parameter to sdma_doorbell_range function and set
sdma doorbell range correctly.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8987e2e2

15 1月, 2019 1 次提交

drm/amdgpu: Add NBIO SMN headers v2 · a0bb79e2

由 Kent Russell 提交于 1月 07, 2019

We need these offsets for PCIE perf counters, so include them as well as
the the previously-used defines from the nbio_*.c files

v2: Return NBIF definitions back to previous files
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a0bb79e2

21 12月, 2018 1 次提交

drm/amdgpu/nbio7.4: add hw bug workaround for vega20 · e01f2d41

由 Alex Deucher 提交于 12月 19, 2018

Configure PCIE_CI_CNTL to work around a hw bug that affects
some multi-GPU compute workloads.
Acked-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e01f2d41

20 9月, 2018 1 次提交

drm/amdgpu: add vega20 sriov capability detection · a2045ee6

由 Frank Min 提交于 4月 27, 2018

Add sriov capability detection for vega20, then can check if device is
virtual device.
Signed-off-by: NFrank Min <Frank.Min@amd.com>
Signed-off-by: NXiangliang Yu <Xiangliang.Yu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a2045ee6

28 8月, 2018 3 次提交

drm/amdgpu: fix sdma doorbell range setting · 52de2ea7

由 Evan Quan 提交于 8月 21, 2018

Use the old doorbell range setting until the driver is
able to support more sdma queues.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52de2ea7

drm/amdgpu: Add nbio 7.4 support for vega20 (v3) · fe3c9489

由 Feifei Xu 提交于 3月 23, 2018

Some register offset in nbio v7.4 are different with v7.0.
We need a seperate nbio_v7_4.c for vega20.

v2: fix doorbell range for sdma (Alex)
v3: squash in static fix (kbuild test robot)
Signed-off-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fe3c9489

Revert "drm/amdgpu: Add nbio support for vega20 (v2)" · 25eaa565

由 Alex Deucher 提交于 4月 03, 2018

Revert this to add proper nbio 7.4 support.

This reverts commit f5b2e1fa321eff20a9418ebd497d8a466f024a85.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

25eaa565

17 5月, 2018 1 次提交

drm/amdgpu: Add nbio support for vega20 (v2) · a95d89e2

由 Feifei Xu 提交于 3月 23, 2018

Some register offset in nbio v7.4 are different with v7.0.

v2: Use nbio7.0 for now.

TODO: add a new nbio 7.4 module (Alex)
Signed-off-by: NFeifei Xu <Feifei.Xu@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a95d89e2

27 2月, 2018 1 次提交

drm/amdgpu: use the TTM dummy page instead of allocating one · 92e71b06

由 Christian König 提交于 2月 22, 2018

We have a global dummy page in TTM, use that one instead of allocating a
new one.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NMichel Dänzer <michel.daenzer@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

92e71b06

20 2月, 2018 1 次提交

drm/amdgpu: add optional ring to *_hdp callbacks · 69882565

由 Christian König 提交于 1月 19, 2018

This adds an optional ring to the invalidate_hdp and flush_hdp
callbacks. If the ring isn't specified or the emit_wreg function not
available the HDP operation will be done with the CPU otherwise by
writing on the ring.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

69882565

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功