提交 · 8a7a3d1d0dcf2bb63dafe7275020420005e13e54 · openeuler / Kernel

18 6月, 2020 1 次提交

drm/amdkfd: Use correct major in devcgroup check · 99c7b309

由 Lorenz Brun 提交于 6月 11, 2020

The existing code used the major version number of the DRM driver
instead of the device major number of the DRM subsystem for
validating access for a devices cgroup.

This meant that accesses allowed by the devices cgroup weren't
permitted and certain accesses denied by the devices cgroup were
permitted (if they matched the wrong major device number).
Signed-off-by: NLorenz Brun <lorenz@brun.one>
Fixes: 6b855f7b ("drm/amdkfd: Check against device cgroup")
Reviewed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

99c7b309

10 6月, 2020 1 次提交

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites · d8ed45c5

由 Michel Lespinasse 提交于 6月 08, 2020

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8ed45c5

22 5月, 2020 2 次提交

drm/amdkfd: report the real PCI bus number · 997769fa

由 Evan Quan 提交于 5月 21, 2020

Since the PCI bus number retrieved by PCI_BUS_NUM(pdev->devfn)
is wrong.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

997769fa

drm/amdkfd: Fix boolreturn.cocci warnings · 8c8e1f69

由 Aishwarya Ramakrishnan 提交于 5月 18, 2020

Return statements in functions returning bool should use
true/false instead of 1/0.

drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c:40:9-10:
WARNING: return of 0/1 in function 'event_interrupt_isr_v9' with return type bool

Generated by: scripts/coccinelle/misc/boolreturn.cocci
Signed-off-by: NAishwarya Ramakrishnan <aishwaryarj100@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8c8e1f69

02 5月, 2020 1 次提交

drm/amdkfd: Use a systematic method to calculate queue mask bit · d09f85d5

由 Yong Zhao 提交于 3月 04, 2020

The queue mask used for set_resources always assumes the queue number
per pipe is 8, so KFD needs to align with that by using function
amdgpu_queue_mask_bit_to_set_resource_bit().
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d09f85d5

01 5月, 2020 3 次提交

drm/amdkfd: Fix comment formatting · 0aeaaf64

由 Felix Kuehling 提交于 4月 29, 2020

Corrected two function names. Added a missing space.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0aeaaf64

drm/amdkfd: Report domain with topology · 3e58e95a

由 Ori Messinger 提交于 8月 21, 2019

PCI domain has moved to 32-bits to accommodate virtualization,
so a 32-bit integer is exposed for domain to reflect this change.

Domain can be found in here:
/sys/class/kfd/kfd/topology/nodes/X/properties
Where X is the card number
Signed-off-by: NOri Messinger <ori.messinger@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3e58e95a

drm/amdkfd: Track GPU memory utilization per process · d4566dee

由 Mukul Joshi 提交于 4月 28, 2020

Track GPU VRAM usage on a per process basis and report it through
sysfs.
Signed-off-by: NMukul Joshi <mukul.joshi@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d4566dee

29 4月, 2020 3 次提交

drm/amdkfd: Enable over-subscription with >1 GWS queue · b8020b03

由 Joseph Greathouse 提交于 9月 18, 2019

The current GWS usage model will only allows a single GWS-enabled
process to be active on the GPU at once. This ensures that a
barrier-using kernel gets a known amount of GPU hardware, to
prevent deadlock due to inability to go beyond the GWS barrier.

The HWS watches how many GWS entries are assigned to each process,
and goes into over-subscription mode when two processes need more
than the 64 that are available. The current KFD method for working
with this is to allocate all 64 GWS entries to each GWS-capable
process.

When more than one GWS-enabled process is in the runlist, we must
make sure the runlist is in over-subscription mode, so that the
HWS gets a chained RUN_LIST packet and continues scheduling
kernels.
Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b8020b03

drm/amdkfd: Enable GWS based on FW Support · 29633d0e

由 Joseph Greathouse 提交于 1月 15, 2020

Rather than only enabling GWS support based on the hws_gws_support
modparm, also check whether the GPU's HWS firmware supports GWS.
Leave the old modparm in place in case users want to test GWS
on GPUs not yet in the support list.

v2: fix broken syntax from the first patch.
Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29633d0e

drm/amdkfd: New IOCTL to allocate queue GWS (v2) · 5bb4b78b

由 Oak Zeng 提交于 5月 06, 2019

Add a new kfd ioctl to allocate queue GWS. Queue
GWS is released on queue destroy.

v2: re-introduce this API with the following fixes squashed in:
- drm/amdkfd: fix null pointer dereference on dev
- drm/amdkfd: Return proper error code for gws alloc API
- drm/amdkfd: Remove GPU ID in GWS queue creation
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5bb4b78b

28 4月, 2020 1 次提交

drm/amdkfd: Put ASIC revision into HSA capability · c6d1ec41

由 Joseph Greathouse 提交于 4月 16, 2020

In order to surface the ASIC revision to user level, we want
to put it into the HSA topology. This can be because different
ASIC revisions may require user-level software to do different
things (e.g. patch code for things that are changed in later
hardware revisions).

The ASIC revision from the hardware is maximum of 4 bits at this
time, so put it into 4 of the open bits in the HSA capability.
Then user-level software can use this capability information to
know -- for each ASIC -- what revision-based things must be done.
Signed-off-by: NJoseph Greathouse <Joseph.Greathouse@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c6d1ec41

23 4月, 2020 1 次提交

drm/amdkfd: Adjust three kfd dmesg printings during initialization · de430916

由 Yong Zhao 提交于 4月 17, 2020

Delete two printings which are not very useful, and change one from
pr_info() to pr_debug().
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

de430916

14 4月, 2020 1 次提交

device_cgroup: Cleanup cgroup eBPF device filter code · eec8fd02

由 Odin Ugedal 提交于 4月 03, 2020

Original cgroup v2 eBPF code for filtering device access made it
possible to compile with CONFIG_CGROUP_DEVICE=n and still use the eBPF
filtering. Change
commit 4b7d4d45 ("device_cgroup: Export devcgroup_check_permission")
reverted this, making it required to set it to y.

Since the device filtering (and all the docs) for cgroup v2 is no longer
a "device controller" like it was in v1, someone might compile their
kernel with CONFIG_CGROUP_DEVICE=n. Then (for linux 5.5+) the eBPF
filter will not be invoked, and all processes will be allowed access
to all devices, no matter what the eBPF filter says.
Signed-off-by: NOdin Ugedal <odin@ugedal.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

eec8fd02

02 4月, 2020 1 次提交

drm/amdkfd: kfree the wrong pointer · 3148a6a0

由 Jack Zhang 提交于 4月 01, 2020

Originally, it kfrees the wrong pointer for mem_obj.
It would cause memory leak under stress test.
Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3148a6a0

19 3月, 2020 1 次提交

drm: amd: fix spelling mistake "shoudn't" -> "shouldn't" · 8cd29608

由 Colin Ian King 提交于 3月 17, 2020

There are spelling mistakes in pr_err messages and a comment. Fix these.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8cd29608

11 3月, 2020 2 次提交

drm/amdkfd: Consolidate duplicated bo alloc flags · 1d251d90

由 Yong Zhao 提交于 3月 04, 2020

ALLOC_MEM_FLAGS_* used are the same as the KFD_IOC_ALLOC_MEM_FLAGS_*,
but they are interweavedly used in kernel driver, resulting in bad
readability. For example, KFD_IOC_ALLOC_MEM_FLAGS_COHERENT is not
referenced in kernel, and it functions implicitly in kernel through
ALLOC_MEM_FLAGS_COHERENT, causing unnecessary confusion.

Replace all occurrences of ALLOC_MEM_FLAGS_* with
KFD_IOC_ALLOC_MEM_FLAGS_* to solve the problem.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1d251d90

drm/amdkfd: Use pr_debug to print the message of reaching event limit · 8f2e0c03

由 Yong Zhao 提交于 3月 09, 2020

People are inclined to think of the previous pr_warn message as an
error, so use pre_debug instead.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8f2e0c03

07 3月, 2020 2 次提交

drm/amdkfd: Signal eviction fence on process destruction (v2) · 129657c8

由 Felix Kuehling 提交于 3月 04, 2020

Otherwise BOs may wait for the fence indefinitely and never be destroyed.

v2: Signal the fence right after destroying queues to avoid unnecessary
    delaye-delete in kfd_process_wq_release
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Nxinhui pan <xinhui.pan@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

129657c8

drm/amdkfd: Add more comments on GFX9 user CP queue MQD workaround · 2f6ae2de

由 Yong Zhao 提交于 3月 04, 2020

Because too many things are involved in this workaround, we need more
comments to avoid pitfalls.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NPhilip Yang <philip.yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2f6ae2de

05 3月, 2020 1 次提交

drm/amdkfd: fix indentation issue · b84fe6ff

由 Colin Ian King 提交于 2月 28, 2020

There is a statement that is indented with spaces instead of a tab.
Replace spaces with a tab.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b84fe6ff

29 2月, 2020 2 次提交

drm/amdkfd: change SDMA MQD memory type · f2cc50ce

由 Eric Huang 提交于 2月 26, 2020

SDMA MQD memory type is NC that causes MQD data overwritten
accidentally by an old stable cache line. Changing it to UC
default for GART will fix the issue.

The mqd_gfx9 parameter is meant for control stacks that are
allocated together with user mode queue MQDs. Setting
mqd_gfx9 to true maps the control stack pages as NC.
Here it was accidentally applied to SDMA MQDs,
which are allocated together with the HIQ MQD. Setting
the mqd_gfx9 to false avoids that.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Acked-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f2cc50ce

drm/amdkfd: Make get_tile_config() generic · fd7d08ba

由 Yong Zhao 提交于 2月 26, 2020

Given we can query all the asic specific information from amdgpu_gfx_config,
we can make get_tile_config() generic.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fd7d08ba

27 2月, 2020 7 次提交

drm/amdkfd: Delete unnecessary unmap queue package submissions · c7637c95

由 Yong Zhao 提交于 2月 05, 2020

The previous way of using SDMA queue count to infer whether we should unmap
SDMA engines has bugs. The reason it did not cause issues is because MEC
firmware unmaps all queues (CP + SDMA) when a unmap package for compute
engine is received. Becasue of that, only one unmap queue package
is needed, instead of one unmap queue package for CP and each SDMA engine,
which results in much simpler driver code.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c7637c95

drm/amdkfd: Delete excessive printings · 1e216474

由 Yong Zhao 提交于 2月 05, 2020

Those printings are duplicated or useless.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1e216474

drm/amdkfd: Fix a memory leak in queue creation error handling · 66f28b9a

由 Yong Zhao 提交于 2月 05, 2020

When the queue creation failed, some resources were not freed. Fix it.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

66f28b9a

drm/amdkfd: Count active CP queues directly · b42902f4

由 Yong Zhao 提交于 2月 05, 2020

The previous code of calculating active CP queues is problematic if
some SDMA queues are inactive. Fix that by counting CP queues directly.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b42902f4

drm/amdkfd: Avoid ambiguity by indicating it's cp queue · e6945304

由 Yong Zhao 提交于 1月 30, 2020

The queues represented in queue_bitmap are only CP queues.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e6945304

drm/amdkfd: Rename queue_count to active_queue_count · 81b820b3

由 Yong Zhao 提交于 1月 30, 2020

The name is easier to understand the code.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

81b820b3

drm/amd: Extend ROCt to surface UUID for devices that have them · 0c663695

由 Divya Shikre 提交于 2月 25, 2020

Devices from Arcturus onwards will have their UUID exposed to Thunk.
Adding neccessary functions to the kernel to propagate the uuid.
Signed-off-by: NDivya Shikre <DivyaUday.Shikre@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0c663695

13 2月, 2020 2 次提交

drm/amdkfd: refactor runtime pm for baco · 9593f4d6

由 Rajneesh Bhardwaj 提交于 1月 21, 2020

So far the kfd driver implemented same routines for runtime and system
wide suspend and resume (s2idle or mem). During system wide suspend the
kfd aquires an atomic lock that prevents any more user processes to
create queues and interact with kfd driver and amd gpu. This mechanism
created problem when amdgpu device is runtime suspended with BACO
enabled. Any application that relies on kfd driver fails to load because
the driver reports a locked kfd device since gpu is runtime suspended.

However, in an ideal case, when gpu is runtime  suspended the kfd driver
should be able to:

 - auto resume amdgpu driver whenever a client requests compute service
 - prevent runtime suspend for amdgpu  while kfd is in use

This change refactors the amdgpu and amdkfd drivers to support BACO and
runtime power management.
Reviewed-by: NOak Zeng <oak.zeng@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9593f4d6

drm/amdkfd: show warning when kfd is locked · 3c1224c0

由 Rajneesh Bhardwaj 提交于 1月 21, 2020

During system suspend the kfd driver aquires a lock that prohibits
further kfd actions unless the gpu is resumed. This adds some info which
can be useful while debugging.
Reviewed-by: NOak Zeng <oak.zeng@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NRajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3c1224c0

07 2月, 2020 1 次提交

drm/amdkfd: Add queue information to sysfs · 6d220a7e

由 Amber Lin 提交于 1月 30, 2020

Provide compute queues information in sysfs under /sys/class/kfd/kfd/proc.
The format is /sys/class/kfd/kfd/proc/<pid>/queues/<queue id>/XX where
XX are size, type, and gpuid three files to represent queue size, queue
type, and the GPU this queue uses. <queue id> folder and files underneath
are generated when a queue is created. They are removed when the queue is
destroyed.
Signed-off-by: NAmber Lin <Amber.Lin@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6d220a7e

04 2月, 2020 1 次提交

drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode · f38abc15

由 Yong Zhao 提交于 1月 29, 2020

The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f38abc15

17 1月, 2020 4 次提交

drm/amdkfd: Add a message when SW scheduler is used · 52055039

由 Yong Zhao 提交于 1月 10, 2020

SW scheduler is previously called non HW scheduler, or non HWS. This
message is useful when triaging issues from dmesg.
Signed-off-by: NYong Zhao <Yong.Zhao@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

52055039

drm/amdkfd: use map_queues for hiq on gfx v10 as well · 8eee00f6

由 Huang Rui 提交于 1月 10, 2020

To align with gfx v9, we use the map_queues packet to load hiq MQD.
Signed-off-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8eee00f6

drm/amdkfd: use kiq to load the mqd of hiq queue for gfx v9 (v6) · 35cd89d5

由 Aaron Liu 提交于 12月 25, 2019

There is an issue that CP will check the HIQ queue to be configured and mapped
with KIQ ring, otherwise, it will be unable to read back the secure buffer while
the gfxoff is enabled even with trusted IP blocks.

v1 -> v2:
- Fix to remove surplus set_resources packets.
- Fill the whole configuration in MQD.
- Change the author as Aaron because he addressed the key point of this issue.
- Add kiq ring lock.

v2 -> v3:
- Free the lock while in error return case.
- Remove the programming only needed by the queue is unmapped.

v3 -> v4:
- Remove doorbell programming because it's used for restarting queue.
- Remove CP scheduler programming because map_queue packet will handle this.

v4 -> v5:
- Remove cp_hqd_active because mec ucode will enable it while use map_queues.
- Revise goto out_unlock.
- Correct the right doorbell offset for HIQ that kfd driver assigned in the
  packet.

v5 -> v6:
- Merge Arcturus fix into this patch because it will get oops in Arcturus
  platform.
Reported-by: NLisa Saturday <Lisa.Saturday@amd.com>
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Signed-off-by: NHuang Rui <ray.huang@amd.com>
Reviewed-and-Tested-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

35cd89d5

drm/amdgpu: GPU TLB flush API moved to amdgpu_amdkfd · ffa02269

由 Alex Sierra 提交于 12月 19, 2019

[Why]
TLB flush method has been deprecated using kfd2kgd interface.
This implementation is now on the amdgpu_amdkfd API.

[How]
TLB flush functions now implemented in amdgpu_amdkfd.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ffa02269

10 1月, 2020 1 次提交

drm/amdkfd: Improve kfd_process lookup in kfd_ioctl · 0f899fd4

由 Felix Kuehling 提交于 12月 04, 2019

Use filep->private_data to store a pointer to the kfd_process data
structure. Take an extra reference for that, which gets released in
the kfd_release callback. Check that the process calling kfd_ioctl
is the same that opened the file descriptor. Return -EBADF if it's
not, so that this error can be distinguished in user mode.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0f899fd4

08 1月, 2020 1 次提交

drm/amdkfd: Avoid hanging hardware in stop_cpsch · c2a77fde

由 Felix Kuehling 提交于 12月 20, 2019

Don't use the HWS if it's known to be hanging. In a reset also
don't try to destroy the HIQ because that may hang on SRIOV if the
KIQ is unresponsive.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Tested-by: NEmily Deng <Emily.Deng@amd.com>
Reviewed-by: Nshaoyunl  <shaoyun.liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c2a77fde

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功