提交 · b1f8166640e02a9cb978ba68301453878fb9a5f2 · openeuler / Kernel

09 10月, 2021 1 次提交

drm/amdkfd: rm BO resv on validation to avoid deadlock · ec6abe83

由 Alex Sierra 提交于 10月 07, 2021

This fix the deadlock with the BO reservations during SVM_BO evictions
while allocations in VRAM are concurrently performed. More specific,
while the ttm waits for the fence to be signaled (ttm_bo_wait), it
already has the BO reserved. In parallel, the restore worker might be
running, prefetching memory to VRAM. This also requires to reserve the
BO, but blocks the mmap semaphore first. The deadlock happens when the
SVM_BO eviction worker kicks in and waits for the mmap semaphore held
in restore worker. Preventing signal the fence back, causing the
deadlock until the ttm times out.

We don't need to hold the BO reservation anymore during validation
and mapping. Now the physical addresses are taken from hmm_range_fault.
We also take migrate_mutex to prevent range migration while
validate_and_map update GPU page table.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NPhilip Yang <philip.yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ec6abe83

06 10月, 2021 1 次提交

drm/amdkfd: remove redundant iommu cleanup code · 499f4d38

由 Yifan Zhang 提交于 9月 24, 2021

kfd_resume doesn't involve iommu operation, remove
redundant iommu cleanup code.
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NJames Zhu <James.Zhu@amd.com>
Tested-by: NJames Zhu <James.Zhu@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

499f4d38

05 10月, 2021 3 次提交

drm/amdkfd: convert kfd_device.c to use GC IP version · c868d584

由 Alex Deucher 提交于 8月 12, 2021

rather than asic type.

v2: fix up CZ case
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c868d584

drm/amdkfd: clean up parameters in kgd2kfd_probe · 5b983db8

由 Alex Deucher 提交于 8月 12, 2021

We can get the pdev and asic type from the adev.  No need
to pass them explicitly.

v2: squash in build fix for !CONFIG_HSA_AMD from Anson
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5b983db8

amd/amdkfd: add ras page retirement handling for sq/sdma (v3) · c7490949

由 Tao Zhou 提交于 9月 23, 2021

In ras poison mode, page retirement will be handled by the irq handler of the
module which consumes corrupted data.

v2: rename ras_process_cb to ras_poison_consumption_handler.
    move the handler's implementation from ASIC specific file to common
file.

v3: call gpu reset for xGMI connected mode.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c7490949

30 9月, 2021 1 次提交

drm/amdkfd: fix resource_size.cocci warnings · 0de5472a

由 Yang Li 提交于 9月 26, 2021

Use resource_size function on resource object
instead of explicit computation.

Clean up coccicheck warning:
./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:905:10-13: ERROR: Missing
resource_size with res
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0de5472a

24 9月, 2021 4 次提交

drm/amdkfd: fix svm_migrate_fini warning · 22f4f4fa

由 Philip Yang 提交于 9月 20, 2021

Device manager releases device-specific resources when a driver
disconnects from a device, devm_memunmap_pages and
devm_release_mem_region calls in svm_migrate_fini are redundant.

It causes below warning trace after patch "drm/amdgpu: Split
amdgpu_device_fini into early and late", so remove function
svm_migrate_fini.

BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718

WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
devm_release_action+0x51/0x60
Call Trace:
    ? memunmap_pages+0x360/0x360
    svm_migrate_fini+0x2d/0x60 [amdgpu]
    kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
    amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
    amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
    amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
    drm_dev_release+0x20/0x40 [drm]
    release_nodes+0x196/0x1e0
    device_release_driver_internal+0x104/0x1d0
    driver_detach+0x47/0x90
    bus_remove_driver+0x7a/0xd0
    pci_unregister_driver+0x3d/0x90
    amdgpu_exit+0x11/0x20 [amdgpu]
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

22f4f4fa

drm/amdkfd: handle svm migrate init error · 586d71a4

由 Philip Yang 提交于 9月 17, 2021

If svm migration init failed to create pgmap for device memory, set
pgmap type to 0 to disable device SVM support capability.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

586d71a4

drm/amdkfd: fix dma mapping leaking warning · e7eb2137

由 Philip Yang 提交于 9月 14, 2021

For xnack off, restore work dma unmap previous system memory page, and
dma map the updated system memory page to update GPU mapping, this is
not dma mapping leaking, remove the WARN_ONCE for dma mapping leaking.

prange->dma_addr store the VRAM page pfn after the range migrated to
VRAM, should not dma unmap VRAM page when updating GPU mapping or
remove prange. Add helper svm_is_valid_dma_mapping_addr to check VRAM
page and error cases.

Mask out SVM_RANGE_VRAM_DOMAIN flag in dma_addr before calling amdgpu vm
update to avoid BUG_ON(*addr & 0xFFFF00000000003FULL), and set it again
immediately after. This flag is used to know the type of page later to
dma unmapping system memory page.

Fixes: 1d5dbfe6 ("drm/amdkfd: classify and map mixed svm range pages in GPU")
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e7eb2137

drm/amdkfd: SVM map to gpus check vma boundary · 1aed4828

由 Philip Yang 提交于 9月 13, 2021

SVM range may includes multiple VMAs with different vm_flags, if prange
page index is the last page of the VMA offset + npages, update GPU
mapping to create GPU page table with same VMA access permission.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1aed4828

16 9月, 2021 2 次提交

drm/amdkfd: separate kfd_iommu_resume from kfd_resume · fefc01f0

由 James Zhu 提交于 9月 07, 2021

Separate kfd_iommu_resume from kfd_resume for fine-tuning
of amdgpu device init/resume/reset/recovery sequence.

v2: squash in fix for !CONFIG_HSA_AMD

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

fefc01f0

drm/amdkfd: make needs_pcie_atomics FW-version dependent · fb932dfe

由 Felix Kuehling 提交于 8月 31, 2021

On some GPUs the PCIe atomic requirement for KFD depends on the MEC
firmware version. Add a firmware version check for this. The minimum
firmware version that works without atomics can be updated in the
device_info structure for each GPU type.

Move PCIe atomic detection from kgd2kfd_probe into kgd2kfd_device_init
because the MEC firmware is not loaded yet at the probe stage.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fb932dfe

15 9月, 2021 2 次提交

drm/amdkfd: separate kfd_iommu_resume from kfd_resume · f8846323

由 James Zhu 提交于 9月 07, 2021

Separate kfd_iommu_resume from kfd_resume for fine-tuning
of amdgpu device init/resume/reset/recovery sequence.

v2: squash in fix for !CONFIG_HSA_AMD

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277Signed-off-by: NJames Zhu <James.Zhu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f8846323

drm/amdkfd: make needs_pcie_atomics FW-version dependent · e312af6c

由 Felix Kuehling 提交于 8月 31, 2021

On some GPUs the PCIe atomic requirement for KFD depends on the MEC
firmware version. Add a firmware version check for this. The minimum
firmware version that works without atomics can be updated in the
device_info structure for each GPU type.

Move PCIe atomic detection from kgd2kfd_probe into kgd2kfd_device_init
because the MEC firmware is not loaded yet at the probe stage.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e312af6c

02 9月, 2021 1 次提交

drm/amdkfd: drop process ref count when xnack disable · d6043581

由 Alex Sierra 提交于 8月 31, 2021

During svm restore pages interrupt handler, kfd_process ref count was
never dropped when xnack was disabled. Therefore, the object was never
released.

Fixes: 2383f56b ("drm/amdkfd: page table restore through svm API")
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Reviewed-by: NPhilip Yang <philip.yang@amd.com>
Reviewed-by: NJonathan Kim <jonathan.kim@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

d6043581

27 8月, 2021 1 次提交

drm/amdkfd: Account for SH/SE count when setting up cu masks. · 1ec06c2d

由 Sean Keely 提交于 8月 19, 2021

On systems with multiple SH per SE compute_static_thread_mgmt_se#
is split into independent masks, one for each SH, in the upper and
lower 16 bits.  We need to detect this and apply cu masking to each
SH.  The cu mask bits are assigned first to each SE, then to
alternate SHs, then finally to higher CU id.  This ensures that
the maximum number of SPIs are engaged as early as possible while
balancing CU assignment to each SH.

v2: Use max SH/SE rather than max SH in cu_per_sh.

v3: Fix comment blocks, ensure se_mask is initially zero filled,
    and correctly assign se.sh.cu positions to unset bits in cu_mask.
Signed-off-by: NSean Keely <Sean.Keely@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1ec06c2d

25 8月, 2021 2 次提交

drm/amdkfd: map SVM range with correct access permission · 2f617f4d

由 Philip Yang 提交于 8月 18, 2021

Restore retry fault or prefetch range, or restore svm range after
eviction to map range to GPU with correct read or write access
permission.

Range may includes multiple VMAs, update GPU page table with offset of
prange, number of pages for each VMA according VMA access permission.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2f617f4d

drm/amdkfd: check access permisson to restore retry fault · ff891a2e

由 Philip Yang 提交于 8月 15, 2021

Check range access permission to restore GPU retry fault, if GPU retry
fault on address which belongs to VMA, and VMA has no read or write
permission requested by GPU, failed to restore the address. The vm fault
event will pass back to user space.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ff891a2e

17 8月, 2021 2 次提交

drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure · f924f3a1

由 Yifan Zhang 提交于 8月 10, 2021

KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test.

Note: Google Test filter = KFDSVMRangeTest.*
[==========] Running 18 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 18 tests from KFDSVMRangeTest
[ RUN      ] KFDSVMRangeTest.BasicSystemMemTest
[       OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms)
[ RUN      ] KFDSVMRangeTest.SetGetAttributesTest
[          ] Get default atrributes
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure
Value of: expectedDefaultResults[i]
  Actual: 4
Expected: outputAttributes[i].type
Which is: 2
[          ] Setting/Getting atrributes
[  FAILED  ]

the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus
some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before
iterate svm interval tree.
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f924f3a1

drm/amdkfd: fix random KFDSVMRangeTest.SetGetAttributesTest test failure · 2bbab7ce

由 Yifan Zhang 提交于 8月 10, 2021

KFDSVMRangeTest.SetGetAttributesTest randomly fails in stress test.

Note: Google Test filter = KFDSVMRangeTest.*
[==========] Running 18 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 18 tests from KFDSVMRangeTest
[ RUN      ] KFDSVMRangeTest.BasicSystemMemTest
[       OK ] KFDSVMRangeTest.BasicSystemMemTest (30 ms)
[ RUN      ] KFDSVMRangeTest.SetGetAttributesTest
[          ] Get default atrributes
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:154: Failure
Value of: expectedDefaultResults[i]
  Actual: 4294967295
Expected: outputAttributes[i].value
Which is: 0
/home/yifan/brahma/libhsakmt/tests/kfdtest/src/KFDSVMRangeTest.cpp:152: Failure
Value of: expectedDefaultResults[i]
  Actual: 4
Expected: outputAttributes[i].type
Which is: 2
[          ] Setting/Getting atrributes
[  FAILED  ]

the root cause is that svm work queue has not finished when svm_range_get_attr is called, thus
some garbage svm interval tree data make svm_range_get_attr get wrong result. Flush work queue before
iterate svm interval tree.
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2bbab7ce

12 8月, 2021 2 次提交

drm/amdkfd: CWSR with software scheduler · b53ef0df

由 Mukul Joshi 提交于 8月 09, 2021

This patch adds support to program trap handler settings
when loading driver with software scheduler (sched_policy=2).
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Suggested-by: NJay Cornwall <Jay.Cornwall@amd.com>
Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b53ef0df

drm/amdkfd: AIP mGPUs best prefetch location for xnack on · eff8cbf0

由 Philip Yang 提交于 7月 31, 2021

For xnack on, if range ACCESS or ACCESS_IN_PLACE (AIP) by single GPU, or
range is ACCESS_IN_PLACE by mGPUs and all mGPUs connection on XGMI same
hive, the best prefetch location is prefetch_loc GPU. Otherwise, the best
prefetch location is always CPU because GPU does not have coherent
mapping VRAM of other GPUs even with large-BAR PCIe connection.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

eff8cbf0

07 8月, 2021 1 次提交

drm/amdkfd: Allow querying SVM attributes that are clear · a43e2a0e

由 Felix Kuehling 提交于 7月 16, 2021

Currently the SVM get_attr call allows querying, which flags are set
in the entire address range. Add the opposite query, which flags are
clear in the entire address range. Both queries can be combined in a
single get_attr call, which allows answering questions such as, "is
this address range coherent, non-coherent, or a mix of both"?

Proposed userspace for UAPI:
https://github.com/RadeonOpenCompute/ROCR-Runtime/tree/memory_model_queriesSigned-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NPhilip Yand <philip.yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a43e2a0e

06 8月, 2021 1 次提交

drm/amdkfd: Expose GFXIP engine version to sysfs · 9d6fa9c7

由 Graham Sider 提交于 7月 12, 2021

Add u32 gfx_target_version field to kfd_node_properties and
kfd_device_info. Populate <asic>_device_info structs accordingly and
expose to sysfs.

This allows eliminating device-ID-based lookup tables in user mode for
future ASICs.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9d6fa9c7

03 8月, 2021 4 次提交

drm/amdkfd: Only apply heavy-weight TLB flush on Aldebaran · a50fe707

由 Eric Huang 提交于 7月 14, 2021

It is to workaround HW bug on other Asics and based on
reverting two commits back:
  drm/amdkfd: Add heavy-weight TLB flush after unmapping
  drm/amdkfd: Add memory sync before TLB flush on unmap
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a50fe707

Revert "Revert "drm/amdkfd: Add memory sync before TLB flush on unmap"" · 626803d1

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 4bba567c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

626803d1

Revert "Revert "drm/amdkfd: Make TLB flush conditional on mapping"" · fce1a7eb

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 7ed9876c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fce1a7eb

Revert "Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping"" · 4a134261

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 430f8e6e.

Revert reason: Issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a134261

29 7月, 2021 3 次提交

Revert "Revert "drm/amdkfd: Add memory sync before TLB flush on unmap"" · b928ecfb

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 4bba567c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b928ecfb

Revert "Revert "drm/amdkfd: Make TLB flush conditional on mapping"" · 8f0e2d5c

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 7ed9876c.

Revert reason: The issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8f0e2d5c

Revert "Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping"" · f8753434

由 Eric Huang 提交于 7月 26, 2021

This reverts commit 430f8e6e.

Revert reason: Issue has been resolved.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f8753434

23 7月, 2021 7 次提交

drm/amdkfd: enable cyan_skillfish KFD · 06e75b88

由 Tao Zhou 提交于 7月 13, 2021

Add KFD support for cyan_skillfish.

v2: whitespace fixes (Alex)
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

06e75b88

drm/amdkfd: Update SMI throttle event bitmask · 410e302e

由 Graham Sider 提交于 7月 06, 2021

Update Arcturus/Aldebaran thermal throttle SMI event path to use
ASIC-independent throttler bits when logging.
Signed-off-by: NGraham Sider <Graham.Sider@amd.com>
Reviewed-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

410e302e

drm/amdkfd: Fix a concurrency issue during kfd recovery · 4f942aae

由 Oak Zeng 提交于 7月 15, 2021

start_cpsch and stop_cpsch can be called during kfd device
initialization or during gpu reset/recovery. So they can
run concurrently. Currently in start_cpsch and stop_cpsch,
pm_init and pm_uninit is not protected by the dpm lock.
Imagine such a case that user use packet manager's function
to submit a pm4 packet to hang hws (ie through command
cat /sys/class/kfd/kfd/topology/nodes/1/gpu_id | sudo tee
/sys/kernel/debug/kfd/hang_hws), while kfd device is under
device reset/recovery so packet manager can be not initialized.
There will be unpredictable protection fault in such case.

This patch moves pm_init/uninit inside the dpm lock and check
packet manager is initialized before using packet manager
function.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Acked-by: NChristian Konig <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4f942aae

drm/amdkfd: Set priv_queue to NULL after it is freed · 78ccea9f

由 Oak Zeng 提交于 7月 15, 2021

This variable will be used to determine whether packet
manager is initialized or not, in a future patch.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Acked-by: NChristian Konig <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

78ccea9f

drm/amdkfd: Renaming dqm->packets to dqm->packet_mgr · 9af5379c

由 Oak Zeng 提交于 7月 15, 2021

Renaming packets to packet_mgr to reflect the real meaning
of this variable.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Acked-by: NChristian Konig <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9af5379c

drm/amdkfd: report pcie bandwidth to the kfd · 93304810

由 Jonathan Kim 提交于 6月 02, 2021

Similar to xGMI reporting the min/max bandwidth between direct peers, PCIe
will report the min/max bandwidth to the KFD.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

93304810

drm/amdkfd: report xgmi bandwidth between direct peers to the kfd · 3f46c4e9

由 Jonathan Kim 提交于 5月 12, 2021

Report the min/max bandwidth in megabytes to the kfd for direct
xgmi connections only.  Indirect peers will report 0 since
indirect route is unknown.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3f46c4e9

13 7月, 2021 2 次提交

Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping" · 5adcd745

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 1098d658.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5adcd745

Revert "drm/amdkfd: Make TLB flush conditional on mapping" · c37387c3

由 Eric Huang 提交于 7月 09, 2021

This reverts commit 31f33243.

Reason for revert: it causes regressions on several Asics.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c37387c3

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功