提交 · 8066008482e533e91934bee49765bf8b4a7c40db · openeuler / Kernel

16 9月, 2021 1 次提交

drm/amdgpu: add amdgpu_amdkfd_resume_iommu · 80660084

由 James Zhu 提交于 9月 07, 2021

Add amdgpu_amdkfd_resume_iommu for amdgpu.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

80660084

17 8月, 2021 1 次提交

drm/amd/amdgpu embed hw_fence into amdgpu_job · c530b02f

由 Jack Zhang 提交于 5月 12, 2021

Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.

How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling
Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com>
Signed-off-by: NJack Zhang <Jack.Zhang7@hotmail.com>
Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed by: Monk Liu <monk.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c530b02f

23 7月, 2021 2 次提交

drm/amdkfd: report pcie bandwidth to the kfd · 93304810

由 Jonathan Kim 提交于 6月 02, 2021

Similar to xGMI reporting the min/max bandwidth between direct peers, PCIe
will report the min/max bandwidth to the KFD.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

93304810

drm/amdkfd: report xgmi bandwidth between direct peers to the kfd · 3f46c4e9

由 Jonathan Kim 提交于 5月 12, 2021

Report the min/max bandwidth in megabytes to the kfd for direct
xgmi connections only.  Indirect peers will report 0 since
indirect route is unknown.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3f46c4e9

20 5月, 2021 2 次提交

drm/amdgpu: Add early fini callback · e9669fb7

由 Andrey Grodzovsky 提交于 5月 19, 2021

Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug

v5:
Move HW finilization into this callback to prevent MMIO accesses
post cpi remove.

v7:
Split kfd suspend from device exit to expdite HW related
stuff to amdgpu_pci_remove

v8:
Squash previous KFD commit into this commit to avoid compile break.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com

e9669fb7

drm/amdkfd: heavy-weight flush TLB after unmap · 765385ec

由 Philip Yang 提交于 5月 13, 2021

Need do a heavy-weight TLB flush to make sure we have no more dirty data
in the cache for the unmapped pages.

Define enum TLB_FLUSH_TYPE, add flush_type parameter to
amdgpu_amdkfd_flush_gpu_tlb_pasid.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

765385ec

10 4月, 2021 3 次提交

drm/amdgpu: use amdgpu_bo_user bo for metadata and tiling flag · cc1bcf85

由 Nirmoy Das 提交于 3月 08, 2021

Tiling flag and metadata are only needed for BOs created by
amdgpu_gem_object_create(), so we can remove those from the
base class.

v2: * squash tiling_flags and metadata relared patches into one
    * use BUG_ON for non ttm_bo_type_device type when accessing
    tiling_flags and metadata._
v3: *include to_amdgpu_bo_user
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cc1bcf85

drm/amdgpu: use amdgpu_bo_create_user() for when possible · 22b40f7a

由 Nirmoy Das 提交于 3月 09, 2021

Use amdgpu_bo_create_user() for all the BO allocations for
ttm_bo_type_device type.

v2: include amdgpu_amdkfd_alloc_gws() as well it calls amdgpu_bo_create()
    for  ttm_bo_type_device
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

22b40f7a

drm/amdgpu: allow variable BO struct creation · 9fd5543e

由 Nirmoy Das 提交于 3月 08, 2021

Allow allocating BO structures with different structure size
than struct amdgpu_bo.

v2: Check bo_ptr_size in all amdgpu_bo_create() caller.
Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9fd5543e

24 3月, 2021 3 次提交

Revert "drm/amdgpu: During compute disable GFXOFF for Sienna_Cichlid" · 6dffd9dc

由 Harish Kasiviswanathan 提交于 3月 09, 2021

This reverts commit 73bf5cad.

Fixed in newer firmware
Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6dffd9dc

drm/amdgpu: Add kfd init_complete flag to check from amdgpu side · 8e2712e7

由 shaoyunl 提交于 2月 16, 2021

amdgpu driver may be in reset state during init which will not initialize the kfd,
driver need to initialize the KFD after reset by check the flag
Signed-off-by: Nshaoyunl <shaoyun.liu@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8e2712e7

drm/amdgpu: Use free system memory size for kfd memory accounting · df23d1bb

由 Oak Zeng 提交于 1月 18, 2021

With the current kfd memory accounting scheme, kfd applications
can use up to 15/16 of total system memory. For system which
has small total system memory size it leaves small system memory
for OS. For example, if the system has totally 16GB of system
memory, this scheme leave OS and non-kfd applications only 1GB
of system memory. In many cases, this leads to OOM killer.

This patch changed the KFD system memory accounting scheme.
15/16 of free system memory when kfd driver load. This deduct
the system memory that OS already use.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Suggested-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df23d1bb

29 1月, 2021 1 次提交

drm/amd/amdkfd: adjust dummy functions' placement · cd63989e

由 Lang Yu 提交于 1月 28, 2021

Move all the dummy functions in amdgpu_amdkfd.c to
amdgpu_amdkfd.h as inline functions.
Signed-off-by: NLang Yu <Lang.Yu@amd.com>
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cd63989e

05 11月, 2020 1 次提交

drm/amdgpu: Change the way to determine framebuffer type · 4c7e8a9e

由 Gang Ba 提交于 10月 08, 2020

Determine FRAMEBUFFER_PUBLIC/PRIVATE only based host-accessibility,
not peer-accesssibility
Signed-off-by: NGang Ba <gaba@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4c7e8a9e

24 10月, 2020 1 次提交

drm/amdgpu: During compute disable GFXOFF for Sienna_Cichlid · 73bf5cad

由 Harish Kasiviswanathan 提交于 10月 22, 2020

Workaround to fix the soft hang observed in certain compute
applications.
Acked-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

73bf5cad

10 10月, 2020 2 次提交

drm/amdgpu: kfd_initialized can be static · 0224b275

由 kernel test robot 提交于 9月 23, 2020

Fixes: c7651b73 ("drm/amdgpu: Fix handling of KFD initialization failures")
Signed-off-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0224b275

drm/amdgpu: kfd_initialized can be static · 402bde58

由 kernel test robot 提交于 9月 23, 2020

Fixes: c7651b73 ("drm/amdgpu: Fix handling of KFD initialization failures")
Signed-off-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

402bde58

26 9月, 2020 1 次提交

drm/amdgpu: store noretry parameter per driver instance · 9b498efa

由 Alex Deucher 提交于 9月 23, 2020

This will allow us to have different defaults per asic
in a future patch.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9b498efa

23 9月, 2020 1 次提交

drm/amdgpu: Fix handling of KFD initialization failures · c7651b73

由 Felix Kuehling 提交于 9月 16, 2020

Remember KFD module initializaton status in a global variable. Skip KFD
device probing when the module was not initialized. Other amdgpu_amdkfd
calls are then protected by the adev->kfd.dev check.

Also print a clear error message when KFD disables itself. Amdgpu
continues its initialization even when KFD failed.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NKent Russell <kent.russell@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c7651b73

25 8月, 2020 2 次提交

drm/amdgpu: Get DRM dev from adev by inline-f · 4a580877

由 Luben Tuikov 提交于 8月 24, 2020

Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a580877

drm/amdgpu: drm_device to amdgpu_device by inline-f (v2) · 1348969a

由 Luben Tuikov 提交于 8月 24, 2020

Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.

v2: Use a typed visible static inline function
    instead of an invisible macro.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1348969a

15 8月, 2020 1 次提交

drm/amdgpu: revert "fix system hang issue during GPU reset" · f1403342

由 Christian König 提交于 8月 12, 2020

The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1a.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1403342

08 8月, 2020 1 次提交

drm/amdgpu: unlock mutex on error · 94561899

由 Dennis Li 提交于 8月 04, 2020

Make sure to unlock the mutex when error happen

v2:
1. correct syntax error in the commit comments
2. remove change-Id
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

94561899

06 8月, 2020 2 次提交

drm/ttm: rename ttm_mem_type_manager -> ttm_resource_manager. · 9de59bc2

由 Dave Airlie 提交于 8月 04, 2020

This name makes a lot more sense, since these are about managing
driver resources rather than just memory ranges.
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NBen Skeggs <bskeggs@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200804025632.3868079-59-airlied@gmail.com

9de59bc2

drm/amdgfx/ttm: use wrapper to get ttm memory managers · 6c28aed6

由 Dave Airlie 提交于 8月 04, 2020

Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200804025632.3868079-38-airlied@gmail.com

6c28aed6

28 7月, 2020 2 次提交

drm/amdkfd: Add thermal throttling SMI event · 2c2b0d88

由 Mukul Joshi 提交于 7月 23, 2020

Add support for reporting thermal throttling events through SMI.
Also, add a counter to count the number of throttling interrupts
observed and report the count in the SMI event message.
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2c2b0d88

drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a

由 Dennis Li 提交于 7月 08, 2020

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NChristian König <christian.koenig@amd.com>
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df9c8d1a

03 7月, 2020 1 次提交

drm/amdgpu: Clean up KFD VMID assignment · 40111ec2

由 Felix Kuehling 提交于 6月 24, 2020

The KFD VMID assignment was hard-coded in a few places. Consolidate that in
a single variable adev->vm_manager.first_kfd_vmid. The value is still
assigned in gmc-ip-version-specific code.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

40111ec2

28 4月, 2020 1 次提交

drm/amdkfd: Put ASIC revision into HSA capability · c6d1ec41

由 Joseph Greathouse 提交于 4月 16, 2020

In order to surface the ASIC revision to user level, we want
to put it into the HSA topology. This can be because different
ASIC revisions may require user-level software to do different
things (e.g. patch code for things that are changed in later
hardware revisions).

The ASIC revision from the hardware is maximum of 4 bits at this
time, so put it into 4 of the open bits in the HSA capability.
Then user-level software can use this capability information to
know -- for each ASIC -- what revision-based things must be done.
Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c6d1ec41

11 3月, 2020 1 次提交

drm/amdkfd: Consolidate duplicated bo alloc flags · 1d251d90

由 Yong Zhao 提交于 3月 04, 2020

ALLOC_MEM_FLAGS_* used are the same as the KFD_IOC_ALLOC_MEM_FLAGS_*,
but they are interweavedly used in kernel driver, resulting in bad
readability. For example, KFD_IOC_ALLOC_MEM_FLAGS_COHERENT is not
referenced in kernel, and it functions implicitly in kernel through
ALLOC_MEM_FLAGS_COHERENT, causing unnecessary confusion.

Replace all occurrences of ALLOC_MEM_FLAGS_* with
KFD_IOC_ALLOC_MEM_FLAGS_* to solve the problem.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1d251d90

07 3月, 2020 1 次提交

drm/amdgpu: Use better names to reflect it is CP MQD buffer · fa5bde80

由 Yong Zhao 提交于 3月 06, 2020

Add "CP" to AMDGPU_GEM_CREATE_MQD_GFX9 to indicate it is only for CP MQD
buffer.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fa5bde80

27 2月, 2020 4 次提交

drm/amdkfd: Avoid ambiguity by indicating it's cp queue · e6945304

由 Yong Zhao 提交于 1月 30, 2020

The queues represented in queue_bitmap are only CP queues.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e6945304

drm/amd: Extend ROCt to surface UUID for devices that have them · 0c663695

由 Divya Shikre 提交于 2月 25, 2020

Devices from Arcturus onwards will have their UUID exposed to Thunk.
Adding neccessary functions to the kernel to propagate the uuid.
Signed-off-by: NDivya Shikre <DivyaUday.Shikre@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0c663695

drm/amdgpu: Fix check for DPM when returning max clock · 944effd3

由 Kent Russell 提交于 2月 25, 2020

pp_funcs may not exist, while dpm may be enabled. This change ensures
that KFD topology will report the same as pp_dpm_sclk, as the conditions
for reporting them will be the same.

Otherwise, we may see the issue where KFD reports "100MHz" in topology
as the max speed, while DPM is working correctly.
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

944effd3

drm/amdgpu: Remove kfd eviction fence before release bo (v2) · f4a3c42b

由 xinhui pan 提交于 2月 11, 2020

No need to trigger eviction as the memory mapping will not be used
anymore.

All pt/pd bos share same resv, hence the same shared eviction fence.
Everytime page table is freed, the fence will be signled and that cuases
kfd unexcepted evictions.

v2: squash in 32 bit fix

CC: Christian König <christian.koenig@amd.com>
CC: Felix Kuehling <felix.kuehling@amd.com>
CC: Alex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Nxinhui pan <xinhui.pan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f4a3c42b

26 2月, 2020 1 次提交

drm/amdgpu: Improve Vega20 XGMI TLB flush workaround · b80cd524

由 Felix Kuehling 提交于 1月 17, 2020

Using a heavy-weight TLB flush once is not sufficient. Concurrent
memory accesses in the same TLB cache line can re-populate TLB entries
from stale texture cache (TC) entries while the heavy-weight TLB
flush is in progress. To fix this race condition, perform another TLB
flush after the heavy-weight one, when TC is known to be clean.

Move the workaround into the low-level TLB flushing functions. This way
they apply to amdgpu as well, and KIQ-based TLB flush only needs to
synchronize once.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Nshaoyun liu <shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b80cd524

13 2月, 2020 1 次提交

drm/amdkfd: refactor runtime pm for baco · 9593f4d6

由 Rajneesh Bhardwaj 提交于 1月 21, 2020

So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

 - auto resume amdgpu driver whenever a client requests compute service
 - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.
Reviewed-by: NOak Zeng <oak.zeng@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9593f4d6

17 1月, 2020 1 次提交

drm/amdgpu: GPU TLB flush API moved to amdgpu_amdkfd · ffa02269

由 Alex Sierra 提交于 12月 19, 2019

[Why]
TLB flush method has been deprecated using kfd2kgd interface.
This implementation is now on the amdgpu_amdkfd API.

[How]
TLB flush functions now implemented in amdgpu_amdkfd.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ffa02269

14 1月, 2020 1 次提交

drm/amd/powerplay: cover the powerplay implementation details V3 · 9530273e

由 Evan Quan 提交于 1月 07, 2020

This can save users much troubles. As they do not
actually need to care whether swSMU or traditional
powerplay routine should be used.

V2: apply the fixes to vi.c and cik.c also
V3: squash in oops fix
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9530273e

30 10月, 2019 1 次提交

drm/amdkfd: Delete duplicated queue bit map reservation · 533bfcae

由 Yong Zhao 提交于 10月 24, 2019

The KIQ is on the second MEC and its reservation is covered in the
latter logic, so no need to reserve its bit twice.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

533bfcae

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功