提交 · d68cf992ded575928cf4ddf7c64faff0d8dcce14 · openeuler / Kernel

15 4月, 2022 2 次提交

drm/amdkfd: fix race condition in kfd_wait_on_events · 250e64a3

由 Felix Kuehling 提交于 4月 12, 2022

Add the waiters to the wait queue during initialization, while holding the
event spinlock. Otherwise the waiter will not get activated if the event
signals before being added to the wait queue.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

250e64a3

drm/amdkfd: potential NULL dereference in kfd_set/reset_event() · abb5bc59

由 Dan Carpenter 提交于 4月 13, 2022

If lookup_event_by_id() returns a NULL "ev" pointer then the
spin_lock(&ev->lock) will crash. This was detected by Smatch:

drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.c:644 kfd_set_event()
error: we previously assumed 'ev' could be null (see line 639)

Fixes: 5273e82c ("drm/amdkfd: Improve concurrency of event handling")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

abb5bc59

13 4月, 2022 3 次提交

drm/amdkfd: Cleanup IO links during KFD device removal · 46d18d51

由 Mukul Joshi 提交于 4月 06, 2022

Currently, the IO-links to the device being removed from topology,
are not cleared. As a result, there would be dangling links left in
the KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
   values are consistent after the device is removed:
   a. Adding a device and removing a GPU device are made mutually
      exclusive.
   b. The global proximity domain counter is no longer required to be
      an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
   changed due to device removal.

CC: Shuotao Xu <shuotaoxu@microsoft.com>
Reviewed-by: NShuotao Xu <shuotaoxu@microsoft.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

46d18d51

drm/amdkfd: shrink bitmap size in struct svm_validate_context · 3925f9b4

由 Lang Yu 提交于 4月 12, 2022

A MAX_GPU_INSTANCE bits bitmap will suffice.
Signed-off-by: NLang Yu <Lang.Yu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3925f9b4

drm/amdkfd: Asynchronously free events · 34d292d5

由 Felix Kuehling 提交于 4月 07, 2022

The synchronize_rcu call in destroy_events can take several ms, which
noticeably slows down applications destroying many events. Use kfree_rcu
to free the event structure asynchronously and eliminate the
synchronize_rcu call in the user thread.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

34d292d5

12 4月, 2022 1 次提交

drm/amdkfd: Handle drain retry fault race with XNACK mode change · edd11922

由 Philip Yang 提交于 4月 05, 2022

Application could change XNACK enabled to disabled while KFD is draining
stale retry fault, therefore the check for whether to drain retry faults
must be before the check for whether xnack_enabled, to avoid report
incorrect vm fault after application changes XNACK mode.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

edd11922

08 4月, 2022 1 次提交

drm/amdkfd: Improve concurrency of event handling · 5273e82c

由 Felix Kuehling 提交于 3月 01, 2022

Use rcu_read_lock to read p->event_idr concurrently with other readers
and writers. Use p->event_mutex only for creating and destroying events
and in kfd_wait_on_events.

Protect the contents of the kfd_event structure with a per-event
spinlock that can be taken inside the rcu_read_lock critical section.

This eliminates contention of p->event_mutex in set_event, which tends
to be on the critical path for dispatch latency even when busy waiting
is used. It also eliminates lock contention in event interrupt handlers.
Since the p->event_mutex is now used much less, the impact of requiring
it in kfd_wait_on_events should also be much smaller.

This should improve event handling latency for processes using multiple
GPUs concurrently.

v2: Reschedule the worker periodically to avoid soft lockup warnings
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Sean Keely <Sean.Keely@amd.com> # v1
Tested-by: NSanjay Tripathi <sanjay.tripathi@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5273e82c

06 4月, 2022 1 次提交

drm/amdkfd: Add missing NULL check in svm_range_map_to_gpu · 96621ca5

由 Philip Yang 提交于 4月 04, 2022

bo_adev is NULL for system memory mapping to GPU.

Fixes: 30671b44 ("drm/amdgpu: fix TLB flushing during eviction")
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

96621ca5

05 4月, 2022 1 次提交

drm/amdgpu: fix TLB flushing during eviction · 30671b44

由 Christian König 提交于 3月 30, 2022

Testing the valid bit is not enough to figure out if we
need to invalidate the TLB or not.

During eviction it is quite likely that we move a BO from VRAM to GTT and
update the page tables immediately to the new GTT address.

Rework the whole function to get all the necessary parameters directly as
value.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

30671b44

01 4月, 2022 2 次提交

drm/amdkfd: Create file descriptor after client is added to smi_clients list · e4542269

由 Lee Jones 提交于 3月 31, 2022

This ensures userspace cannot prematurely clean-up the client before
it is fully initialised which has been proven to cause issues in the
past.

Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e4542269

drm/amdkfd: Use atomic64_t type for pdd->tlb_seq · 8fde0248

由 Philip Yang 提交于 3月 25, 2022

To support multi-thread update page table.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8fde0248

26 3月, 2022 10 次提交

drm/amdgpu: remove table_freed param from the VM code · 8f8cc3fb

由 Christian König 提交于 3月 17, 2022

Better to leave the decision when to flush the VM changes in the TLB to
the VM code.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8f8cc3fb

drm/amdkfd: use tlb_seq from the VM subsystem for SVM as well v2 · 4d30a83c

由 Christian König 提交于 3月 17, 2022

Instead of hand rolling the table_freed parameter.

v2: add some changes suggested by Philip
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4d30a83c

drm/amdkfd: start using tlb_seq from the VM subsystem · bffa91da

由 Christian König 提交于 3月 17, 2022

Instead of trying to figure out if a TLB flush is necessary or not use
the information provided by the VM subsystem now.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bffa91da

drm/amdkfd: print unmap queue status for RAS poison consumption (v3) · ed94aca6

由 Tao Zhou 提交于 3月 21, 2022

Print the status out when it passes, and also tell user gpu reset
is triggered when we fall back to legacy way.

v2: make the message more explicit.
v3: change succeeds to succeeded.
    replace pr_warn with dev_warn.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ed94aca6

drm/amdkfd: add RAS poison consumption handling for UTCL2 (v2) · 1990e29b

由 Tao Zhou 提交于 3月 16, 2022

Do RAS page retirement and use gpu reset as fallback in UTCL2 fault
handler.

v2: replace vm fault event with posion consumed event in UTCL2
poison consumption.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1990e29b

drm/amdkfd: replace source_id with client_id for RAS poison consumption · 9d8a8d78

由 Tao Zhou 提交于 3月 16, 2022

Client ID is more accruate here and we can deal with more different
cases with client ID.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9d8a8d78

drm/amdkfd: refine event_interrupt_poison_consumption · eed41975

由 Tao Zhou 提交于 3月 15, 2022

Combine reading and setting poison flag as one atomic operation
and add print message for the function.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

eed41975

drm/amdkfd: Check for potential null return of kmalloc_array() · ebbb7bb9

由 QintaoShen 提交于 3月 24, 2022

As the kmalloc_array() may return null, the 'event_waiters[i].wait' would lead to null-pointer dereference.
Therefore, it is better to check the return value of kmalloc_array() to avoid this confusion.
Signed-off-by: NQintaoShen <unSimple1993@163.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ebbb7bb9

drm/amdkfd: Check use_xgmi_p2p before reporting hive_id · c5650327

由 Divya Shikre 提交于 3月 22, 2022

Recently introduced commit 158a05a0 ("drm/amdgpu: Add
use_xgmi_p2p module parameter") did not update XGMI iolinks
when use_xgmi_p2p is disabled. Add fix to not create XGMI
iolinks in KFD topology when this parameter is disabled.

Fixes: 158a05a0 ("drm/amdgpu: Add use_xgmi_p2p module parameter")
Signed-off-by: NDivya Shikre <DivyaUday.Shikre@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c5650327

drm/amdkfd: Fix Incorrect VMIDs passed to HWS · b7dfbd2e

由 Tushar Patel 提交于 3月 17, 2022

Compute-only GPUs have more than 8 VMIDs allocated to KFD. Fix
this by passing correct number of VMIDs to HWS

v2: squash in warning fix (Alex)
Signed-off-by: NTushar Patel <tushar.patel@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b7dfbd2e

16 3月, 2022 4 次提交

drm/amdkfd: evict svm bo worker handle error · 9527b9ca

由 Philip Yang 提交于 3月 11, 2022

Migrate vram to ram may fail to find the vma if process is exiting
and vma is removed, evict svm bo worker sets prange->svm_bo to NULL
and warn svm_bo ref count != 1 only if migrating vram to ram
successfully.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9527b9ca

drm/amdkfd: CRIU export dmabuf handles for GTT BOs · 65722ff6

由 David Yat Sin 提交于 3月 08, 2022

Export dmabuf handles for GTT BOs so that their contents can be accessed
using SDMA during checkpoint/restore.

v2: Squash in fix from David to set dmabuf handle to invalid for BOs
that cannot be accessed using SDMA during checkpoint/restore.
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Reviewed-by : Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

65722ff6

drm/amdkfd: CRIU Refactor restore BO function · b38c074b

由 David Yat Sin 提交于 3月 08, 2022

Refactor CRIU restore BO to reduce identation.
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b38c074b

drm/amdkfd: CRIU remove sync and TLB flush on restore · 67a359d8

由 David Yat Sin 提交于 3月 08, 2022

When the process is getting restored, the queues are not mapped yet, so
there is no VMID assigned for this process and no TLBs to flush.
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

67a359d8

10 3月, 2022 1 次提交

drm/amdkfd: bail out early if no get_atc_vmid_pasid_mapping_info · d55957fb

由 Yifan Zhang 提交于 3月 09, 2022

it makes no sense to continue with an undefined vmid.

Fixes: c8b0507f ("drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call")
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reported-by: NNathan Chancellor <nathan@kernel.org>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d55957fb

08 3月, 2022 1 次提交

drm/amdkfd: Add format attribute to kfd_smi_event_add · 53b97af4

由 Philip Yang 提交于 3月 04, 2022

To enable compiler type-checked against the format string in callers.

All warnings (new ones prefixed by >>):

>> warning: function 'kfd_smi_event_add' might be a candidate for
'gnu_printf' format attribute [-Wsuggest-attribute=format]

Fixes: d58b8a99 ("drm/amdkfd: Add SMI add event helper")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53b97af4

05 3月, 2022 1 次提交

drm/amdkfd: judge get_atc_vmid_pasid_mapping_info before call · c8b0507f

由 Yifan Zhang 提交于 3月 03, 2022

Fix the NULL point issue:

[ 3076.255609] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 3076.255624] #PF: supervisor instruction fetch in kernel mode
[ 3076.255637] #PF: error_code(0x0010) - not-present page
[ 3076.255649] PGD 0 P4D 0
[ 3076.255660] Oops: 0010 [#1] SMP NOPTI
[ 3076.255669] CPU: 20 PID: 2415 Comm: kfdtest Tainted: G        W  OE     5.11.0-41-generic #45~20.04.1-Ubuntu
[ 3076.255691] Hardware name: AMD Splinter/Splinter-RPL, BIOS VS2326337N.FD 02/07/2022
[ 3076.255706] RIP: 0010:0x0
[ 3076.255718] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 3076.255732] RSP: 0018:ffffb64283e3fc10 EFLAGS: 00010297
[ 3076.255744] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000027
[ 3076.255759] RDX: ffffb64283e3fc1e RSI: 0000000000000008 RDI: ffff8c7a87f60000
[ 3076.255776] RBP: ffffb64283e3fc78 R08: ffff8c7d88518ac0 R09: ffffb64283e3fa60
[ 3076.255791] R10: 0000000000000001 R11: 0000000000000001 R12: 000000000000000f
[ 3076.255805] R13: ffff8c7bdcea5800 R14: ffff8c7a9f3f3000 R15: ffff8c7a8696bc00
[ 3076.255820] FS:  0000000000000000(0000) GS:ffff8c7d88500000(0000) knlGS:0000000000000000
[ 3076.255839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3076.255851] CR2: ffffffffffffffd6 CR3: 0000000109e3c000 CR4: 0000000000750ee0
[ 3076.255866] PKRU: 55555554
[ 3076.255873] Call Trace:
[ 3076.255884]  dbgdev_wave_reset_wavefronts+0x72/0x160 [amdgpu]
[ 3076.256025]  process_termination_cpsch.cold+0x26/0x2f [amdgpu]
[ 3076.256182]  ? ktime_get_mono_fast_ns+0x4e/0xa0
[ 3076.256196]  kfd_process_dequeue_from_all_devices+0x49/0x70 [amdgpu]
[ 3076.256328]  kfd_process_notifier_release+0x187/0x2b0 [amdgpu]
[ 3076.256451]  ? mn_itree_inv_end+0xdc/0x110
[ 3076.256463]  __mmu_notifier_release+0x74/0x1f0
[ 3076.256474]  exit_mmap+0x170/0x200
[ 3076.256484]  ? __handle_mm_fault+0x677/0x920
[ 3076.256496]  ? _cond_resched+0x19/0x30
[ 3076.256507]  mmput+0x5d/0x130
[ 3076.256518]  do_exit+0x332/0xaf0
[ 3076.256526]  ? handle_mm_fault+0xd7/0x2b0
[ 3076.256537]  do_group_exit+0x43/0xa0
[ 3076.256548]  __x64_sys_exit_group+0x18/0x20
[ 3076.256559]  do_syscall_64+0x38/0x90
[ 3076.256569]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: NYifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c8b0507f

03 3月, 2022 3 次提交

drm/amdkfd: Add SMI add event helper · d58b8a99

由 Philip Yang 提交于 2月 25, 2022

To remove duplicate code, unify event message format and simplify new
event add in the following patches.

Use KFD_SMI_EVENT_MSG_SIZE to define msg size, the same size will be
used in user space to alloc the msg receive buffer.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d58b8a99

drm/amdkfd: Correct SMI event read size · 38abd56b

由 Philip Yang 提交于 12月 16, 2021

sizeof(buf) is 8 bytes because it is defined as unsigned char *buf,
each SMI event read only copy max 8 bytes to user buffer. Correct this
by using the buf allocate size.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

38abd56b

Revert "drm/amdkfd: process_info lock not needed for svm" · e433d684

由 Philip Yang 提交于 2月 25, 2022

This reverts commit 3abfe30d.

To fix deadlock in kFDSVMEvictTest when xnack off.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e433d684

24 2月, 2022 3 次提交

drm/amdkfd: Print bdf in peer map failure message · 0c41b9b5

由 Harish Kasiviswanathan 提交于 2月 15, 2022

Print alloc node, peer node and memory domain when peer map fails. This
is more useful

v2: use dev_err instead of pr_err
    use bdf for identify peer gpu
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NAlex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0c41b9b5

drm/amdkfd: Use real device for messages · a0c5fd46

由 Felix Kuehling 提交于 2月 18, 2022

kfd_chardev() doesn't provide much useful information in dev_... messages
on multi-GPU systems because there is only one KFD device, which doesn't
correspond to any particular GPU. Use the actual GPU device to indicate
the GPU that caused a message.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a0c5fd46

drm/amdkfd: Fix for possible integer overflow · 8f7519b2

由 David Yat Sin 提交于 2月 18, 2022

Fix for possible integer overflow when doing addition.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid Yat Sin <david.yatsin@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8f7519b2

23 2月, 2022 3 次提交

drm/amdkfd: make CRAT table missing message informational only · 9dff13f9

由 Alex Deucher 提交于 2月 18, 2022

The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9dff13f9

drm/amdkfd: Fix criu_restore_bo error handling · 22804e03

由 Felix Kuehling 提交于 2月 18, 2022

Clang static analysis reports this problem
kfd_chardev.c:2327:2: warning: 1st function call argument
  is an uninitialized value
  kvfree(bo_privs);
  ^~~~~~~~~~~~~~~~

Make sure bo_buckets and bo_privs are initialized so freeing them in the
error handling code path will never result in undefined behaviour.

Fixes: 73fa13b6 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
Reported-by: NTom Rix <trix@redhat.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

22804e03

drm/amdkfd: Drop IH ring overflow message to dbg · 757f9e4d

由 Kent Russell 提交于 2月 18, 2022

When this was first implemented, overflows weren't expected in regular
operations, and tests weren't in place to cause said overflow. Now there
are cases where overflows occur with real workloads, but we know that
the kernel can handle this robustly, so move the message to a debug
message.
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

757f9e4d

18 2月, 2022 1 次提交

drm/amdkfd: Use proper enum in pm_unmap_queues_v9() · b63c54d9

由 Nathan Chancellor 提交于 2月 17, 2022

Clang warns:

  drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_packet_manager_v9.c:267:3:
  error: implicit conversion from enumeration type 'enum
  mes_map_queues_extended_engine_sel_enum' to different enumeration type
  'enum mes_unmap_queues_extended_engine_sel_enum'
  [-Werror,-Wenum-conversion]
                  extended_engine_sel__mes_map_queues__sdma0_to_7_sel :
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

Use 'extended_engine_sel__mes_unmap_queues__sdma0_to_7_sel' to eliminate
the warning, which is the same numeric value of the proper type.

Fixes: 009e9a15 ("drm/amdkfd: navi2x requires extended engines to map and unmap sdma queues")
Link: https://github.com/ClangBuiltLinux/linux/issues/1596Signed-off-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b63c54d9

17 2月, 2022 2 次提交

drm/amdkfd: add return value check for queue eviction · 29b440d2

由 Tao Zhou 提交于 2月 16, 2022

Otherwise gpu reset will be triggered unconditionally in poison
consumption.
Signed-off-by: NTao Zhou <tao.zhou1@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29b440d2

drm/amdkfd: Replace zero-length array with flexible-array member · d5c83156

由 Changcheng Deng 提交于 2月 15, 2022

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use "flexible array members" for these cases. The older
style of one-element or zero-length arrays should no longer be used.
Reference:
https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arraysReported-by: NZeal Robot <zealci@zte.com.cn>
Signed-off-by: NChangcheng Deng <deng.changcheng@zte.com.cn>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d5c83156

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功