提交 · 892deb48269c65376f3eeb5b4c032ff2c2979bd7 · openeuler / Kernel

17 12月, 2021 1 次提交

drm/amdgpu: Separate vf2pf work item init from virt data exchange · 892deb48

由 Victor Skvortsov 提交于 12月 16, 2021

We want to be able to call virt data exchange conditionally
after gmc sw init to reserve bad pages as early as possible.
Since this is a conditional call, we will need
to call it again unconditionally later in the init sequence.

Refactor the data exchange function so it can be
called multiple times without re-initializing the work item.

v2: Cleaned up the code. Kept the original call to init_exchange_data()
inside early init to initialize the work item, afterwards call
exchange_data() when needed.
Signed-off-by: NVictor Skvortsov <victor.skvortsov@amd.com>
Reviewed By: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

892deb48

20 5月, 2021 1 次提交

drm/amdgpu: Complete multimedia bandwidth interface · ed9d2053

由 Bokun Zhang 提交于 5月 13, 2021

- Update SRIOV PF2VF header with latest revision

- Extend existing function in amdgpu_virt.c to read MM bandwidth config
  from PF2VF message

- Add SRIOV Sienna Cichlid codec array and update the bandwidth with
  PF2VF message

v2: squash in removal of unused variable (Alex)
Signed-off-by: NBokun Zhang <bokun.zhang@amd.com>
Reviewed-by: NMonk liu <monk.liu@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ed9d2053

10 4月, 2021 3 次提交

drm/amdgpu: indirect register access for nv12 sriov · 5d238510

由 Peng Ju Zhou 提交于 3月 30, 2021

using the control bits got from host to control registers access.
Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: NEmily.Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5d238510

drm/amdgpu: indirect register access for nv12 sriov · 8b8a162d

由 Peng Ju Zhou 提交于 3月 31, 2021

unify host driver and guest driver indirect access
control bits names
Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com>
Reviewed-by: NEmily.Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8b8a162d

drm/amdgpu: Add new PF2VF flags for VF register access method · 4d675e1e

由 Rohit Khaire 提交于 3月 29, 2021

Add 3 sub flags to notify guest for indirect reg access of
gc, mmhub and ih

The host sets these flags depending on L1 RAP version,
asic and other scenarios. These flags ensure that
there is compatibility between different guest/host/vbios versions.
Signed-off-by: NRohit Khaire <rohit.khaire@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4d675e1e

26 9月, 2020 2 次提交

drm/amdgpu: Implement new guest side VF2PF message transaction (v2) · 519b8b76

由 Bokun Zhang 提交于 7月 28, 2020

- Refactor the driver code to use amdgpu_virt_read_pf2vf_data
  and amdgpu_virt_write_vf2pf_data instead of writing all code in
  one function (which is the old amdgpu_virt_init_data_exchange)

- Adding a new transaction method for VF2PF message between host
  and guest driver. Guest side will periodically update VF2PF
  message in the framebuffer.

  In the new header, we include guest ucode information, guest
  framebuffer usage, and engine usage

- Clean up the old macros since they will cause compile error if
  the new transaction method is used

v2: squash in build fix
Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

519b8b76

drm/amdgpu: Update VF2PF interface · 1721bc1b

由 Bokun Zhang 提交于 7月 15, 2020

- Update guest side VF2PF interface header file
Signed-off-by: NBokun Zhang <Bokun.Zhang@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1721bc1b

25 8月, 2020 1 次提交

drm/amdgpu: refine codes to avoid reentering GPU recovery · 53b3f8f4

由 Dennis Li 提交于 8月 19, 2020

if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53b3f8f4

15 8月, 2020 1 次提交

drm/amdgpu: revert "fix system hang issue during GPU reset" · f1403342

由 Christian König 提交于 8月 12, 2020

The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1a.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1403342

28 7月, 2020 1 次提交

drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a

由 Dennis Li 提交于 7月 08, 2020

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NChristian König <christian.koenig@amd.com>
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df9c8d1a

01 7月, 2020 1 次提交

drm/amdgpu: support reserve bad page for virt (v3) · 5278a159

由 Stanley.Yang 提交于 5月 14, 2020

v1: rename some functions name, only init ras error handler data for
    supported asic.

v2: fix potential memory leak.
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5278a159

18 5月, 2020 1 次提交

drm/amdgpu: add amdgpu_virt_get_vf_mode helper function · a7f28103

由 Kevin Wang 提交于 4月 29, 2020

the swsmu or powerplay(hwmgr) need to handle task according to different VF mode,
this function to help query vf mode.

vf mode:
1. SRIOV_VF_MODE_BARE_METAL: the driver work on host  OS (PF)
2. SRIOV_VF_MODE_ONE_VF    : the driver work on guest OS with one VF
3. SRIOV_VF_MODE_MULTI_VF  : the driver work on guest OS with multi VF
Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a7f28103

14 4月, 2020 2 次提交

drm/amdgpu: resume kiq access debugfs · d32709da

由 Yintian Tao 提交于 4月 13, 2020

If there is no GPU hang, user still can access
debugfs through kiq.
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NMonk Liu <Monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d32709da

drm/amdgpu: restrict debugfs register access under SR-IOV · 95a2f917

由 Yintian Tao 提交于 4月 07, 2020

Under bare metal, there is no more else to take
care of the GPU register access through MMIO.
Under Virtualization, to access GPU register is
implemented through KIQ during run-time due to
world-switch.

Therefore, under SR-IOV user can only access
debugfs to r/w GPU registers when meets all
three conditions below.
- amdgpu_gpu_recovery=0
- TDR happened
- in_gpu_reset=0

v2: merge amdgpu_virt_can_access_debugfs() into
    amdgpu_virt_enable_access_debugfs()

v3: drop ret variable in amdgpu_virt_enable_access_debugfs()
    and directly return result
Signed-off-by: NYintian Tao <yttao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95a2f917

02 4月, 2020 3 次提交

drm/amdgpu: introduce new request and its function · aa53bc2e

由 Monk Liu 提交于 3月 04, 2020

1) modify xgpu_nv_send_access_requests to support
new idh request

2) introduce new function: req_gpu_init_data() which
is used to notify host to prepare vbios/ip-discovery/pfvf exchange
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

aa53bc2e

drm/amdgpu: cleanup all virtualization detection routine · 3aa0115d

由 Monk Liu 提交于 3月 04, 2020

we need to move virt detection much earlier because:
1) HW team confirms us that RCC_IOV_FUNC_IDENTIFIER will always
be at DE5 (dw) mmio offset from vega10, this way there is no
need to implement detect_hw_virt() routine in each nbio/chip file.
for VI SRIOV chip (tonga & fiji), the BIF_IOV_FUNC_IDENTIFIER is at
0x1503

2) we need to acknowledged we are SRIOV VF before we do IP discovery because
the IP discovery content will be updated by host everytime after it recieved
a new coming "REQ_GPU_INIT_DATA" request from guest (there will be patches
for this new handshake soon).
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3aa0115d

drm/amdgpu: amends feature bits for MM bandwidth mgr · b89659b7

由 Monk Liu 提交于 3月 03, 2020

Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b89659b7

17 3月, 2020 1 次提交

drm/amdgpu: revise RLCG access path · 2e0cc4d4

由 Monk Liu 提交于 3月 10, 2020

what changed:
1)provide new implementation interface for the rlcg access path
2)put SQ_CMD/SQ_IND_INDEX to GFX9 RLCG path to let debugfs's reg_op
function can access reg that need RLCG path help

now even debugfs's reg_op can used to dump wave.
tested-by: NMonk Liu <monk.liu@amd.com>
tested-by: NZhou pengju <pengju.zhou@amd.com>
Signed-off-by: NZhou pengju <pengju.zhou@amd.com>
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2e0cc4d4

23 1月, 2020 1 次提交

drm/amdgpu: provide a generic function interface for reading/writing register by KIQ · d33a99c4

由 chen gong 提交于 1月 15, 2020

Move amdgpu_virt_kiq_rreg/amdgpu_virt_kiq_wreg function to amdgpu_gfx.c,
and rename them to amdgpu_kiq_rreg/amdgpu_kiq_wreg.Make it generic and
flexible.
Signed-off-by: Nchen gong <curry.gong@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d33a99c4

12 12月, 2019 1 次提交

drm/amd/powerplay: enable pp one vf mode for vega10 · c9ffa427

由 Yintian Tao 提交于 10月 30, 2019

Originally, due to the restriction from PSP and SMU, VF has
to send message to hypervisor driver to handle powerplay
change which is complicated and redundant. Currently, SMU
and PSP can support VF to directly handle powerplay
change by itself. Therefore, the old code about the handshake
between VF and PF to handle powerplay will be removed and VF
will use new the registers below to handshake with SMU.
mmMP1_SMN_C2PMSG_101: register to handle SMU message
mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
mmMP1_SMN_C2PMSG_103: register to handle SMU response

v2: remove module parameter pp_one_vf
v3: fix the parens
v4: forbid vf to change smu feature
v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
v6: change skip condition at vega10_copy_table_to_smc
Signed-off-by: NYintian Tao <yttao@amd.com>
Acked-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NKenneth Feng <kenneth.feng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c9ffa427

02 8月, 2019 1 次提交

drm/amdgpu: cleanup vega10 SRIOV code path · 4cd4c5c0

由 Monk Liu 提交于 7月 30, 2019

we can simplify all those unnecessary function under
SRIOV for vega10 since:
1) PSP L1 policy is by force enabled in SRIOV
2) original logic always set all flags which make itself
   a dummy step

besides,
1) the ih_doorbell_range set should also be skipped
for VEGA10 SRIOV.
2) the gfx_common registers should also be skipped
for VEGA10 SRIOV.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4cd4c5c0

22 6月, 2019 1 次提交

drm/amdgpu: program for resuming preempted ib · 43974dac

由 Jack Xiao 提交于 1月 08, 2019

For new submission ib, CE/DE metadata should be programmed to 0;
for partially execution ib, CE/DE metadata should be restored.
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NJack Xiao <Jack.Xiao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

43974dac

25 5月, 2019 1 次提交

drm/amdgpu: init vega10 SR-IOV reg access mode · 78d48112

由 Trigger Huang 提交于 5月 09, 2019

Set different register access mode according to the features
provided by firmware
Signed-off-by: NTrigger Huang <Trigger.Huang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

78d48112

11 4月, 2019 1 次提交

drm/amdgpu: support dpm level modification under virtualization v3 · bb5a2bdf

由 Yintian Tao 提交于 4月 09, 2019

Under vega10 virtualuzation, smu ip block will not be added.
Therefore, we need add pp clk query and force dpm level function
at amdgpu_virt_ops to support the feature.

v2: add get_pp_clk existence check and use kzalloc to allocate buf

v3: return -ENOMEM for allocation failure and correct the coding style
Signed-off-by: NYintian Tao <yttao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bb5a2bdf

20 11月, 2018 1 次提交

drm/amd/amdgpu/sriov: Aligned the definition with libgv · bed1ed36

由 Emily Deng 提交于 11月 14, 2018

Aligned the amd_sriov_msg_pf2vf_info_header and amd_sriov_msg_pf2vf_info_header's
definition with libgv.
Signed-off-by: NEmily Deng <Emily.Deng@amd.com>
Reviewed-by: NFrank.Min <Frank.Min@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bed1ed36

06 11月, 2018 4 次提交

drm/amdgpu: cleanup GMC v9 TLB invalidation · af5fe1e9

由 Christian König 提交于 10月 25, 2018

Move the kiq handling into amdgpu_virt.c and drop the fallback.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NEmily Deng <Emily.Deng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

af5fe1e9

drm/amdgpu: Move csa related code to separate file · 7946340f

由 Rex Zhu 提交于 10月 19, 2018

In baremetal, also need to reserve csa for preemption.
so move the csa related code out of sriov.
Reviewed-by: NMonk Liu <Monk.Liu@amd.com>
Signed-off-by: NRex Zhu <Rex.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7946340f

drm/amdgpu: Refine CSA related functions · 1e256e27

由 Rex Zhu 提交于 10月 15, 2018

There is no functional changes,
Use function arguments for SRIOV special variables which
is hardcode in those functions.

so we can share those functions in baremetal.
Reviewed-by: NMonk Liu <Monk.Liu@amd.com>
Signed-off-by: NRex Zhu <Rex.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1e256e27

drm/amdgpu: Remove useless csa gpu address in vmid0 · 20bedfe0

由 Rex Zhu 提交于 10月 16, 2018

driver didn't use this address so far.
Reviewed-by: NMonk Liu <Monk.Liu@amd.com>
Signed-off-by: NRex Zhu <Rex.Zhu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

20bedfe0

20 2月, 2018 1 次提交

drm/amdgpu: move static CSA address to top of address space v2 · 6f05c4e9

由 Christian König 提交于 1月 22, 2018

Move the CSA area to the top of the VA space to avoid clashing with
HMM/ATC in the lower range on GFX9.

v2: wrong sign noticed by Roger, rebase on CSA_VADDR cleanup, handle VA
hole on GFX9 as well.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6f05c4e9

07 12月, 2017 1 次提交

drm/amdgpu:free CSA in unified place · 84e5b516

由 Monk Liu 提交于 11月 14, 2017

instead of doing it in each GFX ip's sw_fini
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

84e5b516

05 12月, 2017 5 次提交

drm/amdgpu:read VRAMLOST from gim · 75bc6099

由 Monk Liu 提交于 10月 30, 2017

Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

75bc6099

drm/amdgpu:cleanup in_sriov_reset and lock_reset · 13a752e3

由 Monk Liu 提交于 10月 17, 2017

since now gpu reset is unified with gpu_recover
for both bare-metal and SR-IOV:

1)rename in_sriov_reset to in_gpu_reset
2)move lock_reset from adev->virt to adev
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

13a752e3

drm/amdgpu:implement new GPU recover(v3) · 5740682e

由 Monk Liu 提交于 10月 25, 2017

1,new imple names amdgpu_gpu_recover which gives more hint
on what it does compared with gpu_reset

2,gpu_recover unify bare-metal and SR-IOV, only the asic reset
part is implemented differently

3,gpu_recover will increase hang job karma and mark its entity/context
as guilty if exceeds limit

V2:

4,in scheduler main routine the job from guilty context  will be immedialy
fake signaled after it poped from queue and its fence be set with
"-ECANCELED" error

5,in scheduler recovery routine all jobs from the guilty entity would be
dropped

6,in run_job() routine the real IB submission would be skipped if @skip parameter
equales true or there was VRAM lost occured.

V3:

7,replace deprecated gpu reset, use new gpu recover
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5740682e

drm/amdgpu/virt: add wait_reset virt ops · b636176e

由 pding 提交于 10月 24, 2017

Driver can use this interface to check if there's a function level
reset done in hypervisor. It's helpful when IRQ handler for reset
is not ready, or special handling is required.
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Signed-off-by: Npding <Pixel.Ding@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b636176e

drm/amdgpu/virt: add function to check MMIO (v2) · a16f8f11

由 pding 提交于 10月 24, 2017

MMIO space can be blocked on virtualised device. Add this
function to check if MMIO is blocked or not.

Todo: need a reliable method such like communation
with hypervisor.

v2:
 - add comments inline
Signed-off-by: Npding <Pixel.Ding@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a16f8f11

20 10月, 2017 1 次提交

drm/amdgpu: SR-IOV data exchange between PF&VF · 2dc8f81e

由 Horace Chen 提交于 10月 09, 2017

SR-IOV need to exchange some data between PF&VF through shared VRAM

PF will copy some necessary firmware and information to the shared
VRAM. It also requires some information from VF. PF will send a
key through mailbox2 to help guest calculate checksum so that it can
verify whether the data is correct.

So check the data on the specified offset of the shared VRAM, if the
checksum is right, read values from it and write some VF information
next to the data from PF.
Signed-off-by: NHorace Chen <horace.chen@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2dc8f81e

29 9月, 2017 1 次提交

drm/amdgpu: fix vf error handling · e23b74aa

由 Alex Deucher 提交于 9月 28, 2017

The error handling for virtual functions assumed a single
vf per VM and didn't properly account for bare metal.  Make
the error arrays per device and add locking.
Reviewed-by: NGavin Wan <gavin.wan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e23b74aa

18 8月, 2017 1 次提交

drm/amdgpu: cleanup static CSA handling · 0f4b3c68

由 Christian König 提交于 7月 31, 2017

Move the CSA bo_va from the VM to the fpriv structure.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0f4b3c68

14 7月, 2017 1 次提交

drm/amdgpu: Support passing amdgpu critical error to host via GPU Mailbox. · 89041940

由 Gavin Wan 提交于 6月 23, 2017

This feature works for SRIOV enviroment. For non-SRIOV enviroment, the
trans_error function does nothing.

The error information includes error_code (16bit), error_flags(16bit)
and error_data(64bit). Since there are not many errors, we keep the
errors in an array and transfer all errors to Host before amdgpu
initialization function (amdgpu_device_init) exit.
Signed-off-by: NGavin Wan <Gavin.Wan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

89041940

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功