提交 · d0fb18b535679a28b1f55a312b7454563b9bb36e · openeuler / Kernel

10 2月, 2022 1 次提交

drm/amdgpu: Move reset sem into reset_domain · d0fb18b5

由 Andrey Grodzovsky 提交于 1月 19, 2022

We want single instance of reset sem across all
reset clients because in case of XGMI we should stop
access cross device MMIO because any of them could be
in a reset in the moment.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74117.html

d0fb18b5

15 12月, 2021 1 次提交

drm/amdgpu: move smu_debug_mask to a more proper place · 7e31a858

由 Evan Quan 提交于 12月 13, 2021

As the smu_context will be invisible from outside(of power). Also,
the smu_debug_mask can be shared around all power code instead of
some specific framework(swSMU) only.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7e31a858

14 12月, 2021 1 次提交

drm/amdgpu: add support for SMU debug option · 6ff7fddb

由 Lang Yu 提交于 11月 30, 2021

SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.

Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.

The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:

1, smu_cmn_send_smc_msg_with_param() for normal timeout cases

  Halt on any error.

2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases

  Halt on errors apart from ETIME. Otherwise this way won't work.
  Let the user handle ETIME error in such a case.

3, smu_cmn_send_msg_without_waiting() for no waiting cases

  Halt on errors apart from ETIME. Otherwise second way won't work.

== Command Guide ==

1, enable HALT_ON_ERROR mode

 # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable HALT_ON_ERROR mode

 # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v5:
 - Use bit mask to allow more debug features.(Evan)
 - Use WRAN() instead of BUG().(Evan)

v4:
 - Set to halt state instead of a simple hang.(Christian)

v3:
 - Use debugfs_create_bool().(Christian)
 - Put variable into smu_context struct.
 - Don't resend command when timeout.

v2:
 - Resend command when timeout.(Lijo)
 - Use debugfs file instead of module parameter.
Signed-off-by: NLang Yu <lang.yu@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6ff7fddb

07 10月, 2021 2 次提交

drm/amdgpu: unify BO evicting method in amdgpu_ttm · 58144d28

由 Nirmoy Das 提交于 10月 06, 2021

Unify BO evicting functionality for possible memory
types in amdgpu_ttm.c.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

58144d28

drm/amdgpu: return early if debugfs is not initialized · 5b9581df

由 Nirmoy Das 提交于 10月 06, 2021

Check first if debugfs is initialized before creating
amdgpu debugfs files.

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5b9581df

06 10月, 2021 1 次提交

drm/amdgpu: revert "Add autodump debugfs node for gpu reset v8" · c8365dbd

由 Christian König 提交于 9月 30, 2021

This reverts commit 728e7e0c.

Further discussion reveals that this feature is severely broken
and needs to be reverted ASAP.

GPU reset can never be delayed by userspace even for debugging or
otherwise we can run into in kernel deadlocks.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c8365dbd

05 10月, 2021 1 次提交

drm/amdgpu: add debugfs access to the IP discovery table · 81d1bf01

由 Alex Deucher 提交于 7月 20, 2021

Useful for debugging and new asic validation.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

81d1bf01

15 9月, 2021 4 次提交

drm/amdgpu: use IS_ERR for debugfs APIs · b04ce53e

由 Nirmoy Das 提交于 9月 02, 2021

debugfs APIs returns encoded error so use
IS_ERR for checking return value.

v2: return PTR_ERR(ent)

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-By: NShashank Sharma <shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

b04ce53e

drm/amdgpu: Fix a race of IB test · 0fcfb300

由 xinhui pan 提交于 9月 11, 2021

Direct IB submission should be exclusive. So use write lock.
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0fcfb300

drm/amdgpu: cleanup debugfs for amdgpu rings · 62d266b2

由 Nirmoy Das 提交于 9月 02, 2021

Use debugfs_create_file_size API for creating ring debugfs, and as its a
NULL returning API, change the return type for amdgpu_debugfs_ring_init
API as well. Also cleanup surrounding code.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NShashank Sharma <shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

62d266b2

drm/amdgpu: use IS_ERR for debugfs APIs · 59715cff

由 Nirmoy Das 提交于 9月 02, 2021

debugfs APIs returns encoded error so use
IS_ERR for checking return value.

v2: return PTR_ERR(ent)

References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-By: NShashank Sharma <shashank.sharma@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

59715cff

02 9月, 2021 1 次提交

drm/amd/amdgpu: New debugfs interface for MMIO registers (v5) · 37df9560

由 Tom St Denis 提交于 8月 20, 2021

This new debugfs interface uses an IOCTL interface in order to pass
along state information like SRBM and GRBM bank switching.  This
new interface also allows a full 32-bit MMIO address range which
the previous didn't.  With this new design we have room to grow
the flexibility of the file as need be.

(v2): Move read/write to .read/.write, fix style, add comment
      for IOCTL data structure

(v3): C style comments

(v4): use u32 in struct and remove offset variable

(v5): Drop flag clearing in op function, use 0xFFFFFFFF for broadcast
      instead of 0x3FF, use mutex for op/ioctl.
Signed-off-by: NTom St Denis <tom.stdenis@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

37df9560

17 8月, 2021 1 次提交

drm/amd/amdgpu embed hw_fence into amdgpu_job · c530b02f

由 Jack Zhang 提交于 5月 12, 2021

Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.

How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling
Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: NJack Zhang <Jack.Zhang7@hotmail.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed by: Monk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c530b02f

16 6月, 2021 1 次提交

drm/amdgpu: remove amdgpu_vm_pt · 391629bd

由 Nirmoy Das 提交于 6月 15, 2021

Page table entries are now in embedded in VM BO, so
we do not need struct amdgpu_vm_pt. This patch replaces
struct amdgpu_vm_pt with struct amdgpu_vm_bo_base.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

391629bd

21 5月, 2021 1 次提交

drm/amd/amdgpu/amdgpu_debugfs: Fix a couple of misnamed functions · e72d4a8b

由 Lee Jones 提交于 5月 20, 2021

Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1004: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_write(). Prototype was for amdgpu_debugfs_gfxoff_write() instead
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1053: warning: expecting prototype for amdgpu_debugfs_regs_gfxoff_status(). Prototype was for amdgpu_debugfs_gfxoff_read() instead

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e72d4a8b

04 3月, 2021 1 次提交

drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie · 1aa46901

由 Kevin Wang 提交于 3月 02, 2021

the register offset isn't needed division by 4 to pass RREG32_PCIE()
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

1aa46901

03 3月, 2021 1 次提交

drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie · 43fb6c19

由 Kevin Wang 提交于 3月 02, 2021

the register offset isn't needed division by 4 to pass RREG32_PCIE()
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NLijo Lazar <lijo.lazar@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

43fb6c19

27 2月, 2021 1 次提交

drm/amdgpu: Replace DEFINE_SIMPLE_ATTRIBUTE with DEFINE_DEBUGFS_ATTRIBUTE · 7271a5c2

由 Yang Li 提交于 2月 25, 2021

Fix the following coccicheck warning:
./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1589:0-23: WARNING:
fops_ib_preempt should be defined with DEFINE_DEBUGFS_ATTRIBUTE
./drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:1592:0-23: WARNING:
fops_sclk_set should be defined with DEFINE_DEBUGFS_ATTRIBUTE
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7271a5c2

19 2月, 2021 4 次提交

drm/amdgpu: do not use drm middle layer for debugfs · 98d28ac2

由 Nirmoy Das 提交于 2月 15, 2021

Use debugfs API directly instead of drm middle layer.

This also includes following debugfs file output changes:
1 amdgpu_evict_vram/amdgpu_evict_gtt output will not contain any braces.
  e.g. (0) --> 0
2 amdgpu_gpu_recover output will print return value of
  amdgpu_device_gpu_recover() instead of not so important "gpu recover"
  message.

v2: * checkpatch.pl: use '0444' instead of S_IRUGO.
    * remove S_IFREG from mode.
    * remove mode variable.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

98d28ac2

drm/amd/pm: do not use drm middle layer for debugfs · 373720f7

由 Nirmoy Das 提交于 2月 14, 2021

Use debugfs API directly instead of drm middle layer.

v2: * checkpatch.pl: use '0444' instead of S_IRUGO.
    * remove S_IFREG from mode.
    * remove mode variable.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

373720f7

drm/amd/display: do not use drm middle layer for debugfs · afd3a359

由 Nirmoy Das 提交于 2月 14, 2021

Use debugfs API directly instead of drm middle layer.

v2: * checkpatch.pl: use '0444' instead of S_IRUGO.
    * remove S_IFREG from mode.
    * remove mode variable.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

afd3a359

drm/amdgpu: do not keep debugfs dentry · 88293c03

由 Nirmoy Das 提交于 2月 10, 2021

Cleanup unnecessary debugfs dentries and surrounding functions.

v3: remove return value check for debugfs_create_file()
v2: remove ttm_debugfs_entries array.
    do not init variables.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

88293c03

14 1月, 2021 1 次提交

drm/amdgpu: Add secure display TA interface · ecaafb7b

由 Jinzhou Su 提交于 12月 09, 2020

Add interface to load, unload, invoke command for
secure display TA.

v2: Add debugfs interface for secure display TA
v3: fix warning in copy_from_user (Alex)
Signed-off-by: NJinzhou.Su <Jinzhou.Su@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ecaafb7b

08 12月, 2020 2 次提交

gpu/drm: ring_mirror_list --> pending_list · 6efa4b46

由 Luben Tuikov 提交于 12月 03, 2020

Rename "ring_mirror_list" to "pending_list",
to describe what something is, not what it does,
how it's used, or how the hardware implements it.

This also abstracts the actual hardware
implementation, i.e. how the low-level driver
communicates with the device it drives, ring, CAM,
etc., shouldn't be exposed to DRM.

The pending_list keeps jobs submitted, which are
out of our control. Usually this means they are
pending execution status in hardware, but the
latter definition is a more general (inclusive)
definition.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/405573/

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChristian König <christian.koenig@amd.com>

6efa4b46

drm/scheduler: "node" --> "list" · 8935ff00

由 Luben Tuikov 提交于 12月 03, 2020

Rename "node" to "list" in struct drm_sched_job,
in order to make it consistent with what we see
being used throughout gpu_scheduler.h, for
instance in struct drm_sched_entity, as well as
the rest of DRM and the kernel.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/403515/

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NChristian König <christian.koenig@amd.com>

8935ff00

14 11月, 2020 1 次提交

drm/amd/amdgpu/amdgpu_debugfs: Demote obvious abuse of kernel-doc formatting · 20ed491b

由 Lee Jones 提交于 11月 13, 2020

Fixes the following W=1 kernel build warning(s):

drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:308: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_read'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'f' not described in 'amdgpu_debugfs_regs_write'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'buf' not described in 'amdgpu_debugfs_regs_write'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'size' not described in 'amdgpu_debugfs_regs_write'
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c:317: warning: Function parameter or member 'pos' not described in 'amdgpu_debugfs_regs_write'

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

20ed491b

03 11月, 2020 1 次提交

drm/amdgpu/amdgpu: improve code indentation and alignment · f3729f7b

由 Deepak R Varma 提交于 11月 02, 2020

General code indentation and alignment changes such as replace spaces
by tabs or align function arguments as per the coding style
guidelines. The patch corrects issues for various amdgpu_*.c files
for this driver. Issue reported by checkpatch script.
Signed-off-by: NDeepak R Varma <mh12gx2825@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f3729f7b

27 10月, 2020 1 次提交

drm/amdgpu: added support for psp fw attestation · 19ae3330

由 John Clements 提交于 10月 26, 2020

loaded fw can be queried from sys fs interface
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

19ae3330

16 10月, 2020 1 次提交

drm/amdgpu: Add debugfs entry for printing VM info · ff72bc40

由 Mihir Bhogilal Patel 提交于 10月 08, 2020

Create new debugfs entry to print memory info using VM buffer
objects.

V2: Added Common function for printing BO info.
    Dump more VM lists for evicted, moved, relocated, invalidated.
    Removed dumping VM mapped BOs.
V3: Fixed coding style comments, renamed print API and variables.
V4: Fixed coding style comments.
Signed-off-by: NMihir Bhogilal Patel <Mihir.Patel@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ff72bc40

07 10月, 2020 1 次提交

drm/ttm: nuke ttm_bo_evict_mm and rename mgr function v3 · 4ce032d6

由 Christian König 提交于 10月 01, 2020

Make it more clear what the resource manager function
does and nuke the wrapper function.

v2: nuke the wrapper
v3: fix typo in radeon, rebased
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v2)
Link: https://patchwork.freedesktop.org/patch/393914/

4ce032d6

01 10月, 2020 1 次提交

drm/amdgpu: support indirect access reg outside of mmio bar (v2) · f7ee1874

由 Hawking Zhang 提交于 9月 18, 2020

support both direct and indirect accessor in unified
helper functions.

v2: Retire indirect mmio access via mm_index/data
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f7ee1874

25 8月, 2020 4 次提交

drm/amdgpu: Get DRM dev from adev by inline-f · 4a580877

由 Luben Tuikov 提交于 8月 24, 2020

Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a580877

drm/amdgpu: drm_device to amdgpu_device by inline-f (v2) · 1348969a

由 Luben Tuikov 提交于 8月 24, 2020

Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.

v2: Use a typed visible static inline function
    instead of an invisible macro.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1348969a

drm/amdgpu: change reset lock from mutex to rw_semaphore · 6049db43

由 Dennis Li 提交于 8月 20, 2020

clients don't need reset-lock for synchronization when no
GPU recovery.

v2:
change to return the return value of down_read_killable.

v3:
if GPU recovery begin, VF ignore FLR notification.
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6049db43

drm/amdgpu: refine codes to avoid reentering GPU recovery · 53b3f8f4

由 Dennis Li 提交于 8月 19, 2020

if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53b3f8f4

15 8月, 2020 2 次提交

drm/amdgpu: revert "fix system hang issue during GPU reset" · f1403342

由 Christian König 提交于 8月 12, 2020

The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1a.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1403342

drm/amdgpu: add debugfs interface for RAP test · a4322e18

由 Wenhui Sheng 提交于 8月 11, 2020

After amdgpu driver loading successfully, we can use
RAP debugfs interface <debugfs_dir>/dri/xxx/rap_test
to trigger RAP test.

Currently only L0 validate test is supported.

v2: refine amdgpu_rap.h
Signed-off-by: NWenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: NGuchun Chen <Guchun.Chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4322e18

28 7月, 2020 1 次提交

drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a

由 Dennis Li 提交于 7月 08, 2020

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NChristian König <christian.koenig@amd.com>
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df9c8d1a

22 7月, 2020 1 次提交

drm/amdgpu: add read amdgpu_gfxoff status in debugfs · 443c7f3c

由 Jinzhou.Su 提交于 7月 07, 2020

 Add interface for SMU12 device, used by UMR.

v2: fix code style
Signed-off-by: NJinzhou.Su <Jinzhou.Su@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

443c7f3c

15 7月, 2020 1 次提交

drm/amdgpu: fix preemption unit test · d845a205

由 Jack Xiao 提交于 7月 10, 2020

Remove signaled jobs from job list and ensure the
job was indeed preempted.
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d845a205

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功