提交 · bf9d4e88c28b397ec6ec289c592ed41b552b8929 · openeuler / Kernel

05 6月, 2021 4 次提交

drm/amdkfd: add yellow carp KFD support · bf9d4e88

由 Aaron Liu 提交于 11月 04, 2020

This patch is to add GFX10 based Yellow Carp KFD support.
We will bypass IOMMU v2.
Signed-off-by: NAaron Liu <aaron.liu@amd.com>
Reviewed-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bf9d4e88

drm/amdkfd: Make TLB flush conditional on mapping · 31f33243

由 Eric Huang 提交于 6月 01, 2021

It is to optimize memory mapping latency, and also aviod
a page fault in a corner case of changing valid PDE into
PTE.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

31f33243

drm/amdkfd: Add heavy-weight TLB flush after unmapping · 1098d658

由 Eric Huang 提交于 6月 01, 2021

It is a part of memory mapping optimization.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1098d658

drm/amdkfd: Add flush-type parameter to kfd_flush_tlb · 3543b055

由 Eric Huang 提交于 6月 01, 2021

It is to provide more tlb flush types option for different
case scenario.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

3543b055

22 5月, 2021 1 次提交

drm/amd/amdkfd: Drop unnecessary NULL check after container_of · 6a593769

由 Guenter Roeck 提交于 5月 21, 2021

The first parameter passed to container_of() is the pointer to the work
structure passed to the worker and never NULL. The NULL check on the
result of container_of() is therefore unnecessary and misleading.
Remove it.

This change was made automatically with the following Coccinelle script.

@@
type t;
identifier v;
statement s;
@@

<+...
(
  t v = container_of(...);
|
  v = container_of(...);
)
  ...
  when != v
- if (\( !v \| v == NULL \) ) s
...+>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6a593769

20 5月, 2021 12 次提交

drm/amdgpu: Add early fini callback · e9669fb7

由 Andrey Grodzovsky 提交于 5月 19, 2021

Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug

v5:
Move HW finilization into this callback to prevent MMIO accesses
post cpi remove.

v7:
Split kfd suspend from device exit to expdite HW related
stuff to amdgpu_pci_remove

v8:
Squash previous KFD commit into this commit to avoid compile break.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210520032057.497334-1-andrey.grodzovsky@amd.com

e9669fb7

drm/amdkfd: fix a resource leakage issue · 96b62c8a

由 Dennis Li 提交于 5月 18, 2021

The function kfd_lookup_process_by_pasid will increase the reference
count of kfd_process object, its caller should call kfd_unref_process to
decrease the reference count. Otherwise resource leakage will happen.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

96b62c8a

drm/amdkfd: add kfd2kgd funcs for beige_goby kfd support · c86eb517

由 Chengming Gui 提交于 10月 21, 2020

Add the function pointer.
Signed-off-by: NChengming Gui <Jack.Gui@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c86eb517

drm/amdkfd: support beige_goby KFD · 5cf607cc

由 Chengming Gui 提交于 10月 21, 2020

Add KFD support for beige_goby
v2: fix asic name typo
v3: squash in updates (Alex)
v4: squash in needs_atomics fix (Alex)
Signed-off-by: NChengming Gui <Jack.Gui@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NTao Zhou <tao.zhou1@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5cf607cc

drm/amdkfd: heavy-weight flush TLB after unmap · 765385ec

由 Philip Yang 提交于 5月 13, 2021

Need do a heavy-weight TLB flush to make sure we have no more dirty data
in the cache for the unmapped pages.

Define enum TLB_FLUSH_TYPE, add flush_type parameter to
amdgpu_amdkfd_flush_gpu_tlb_pasid.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

765385ec

Revert "drm/amdkfd: flush TLB after updating GPU page table" · 7a3ae1e2

由 Philip Yang 提交于 5月 13, 2021

This reverts commit 1704ac8e.

After "drm/amdgpu: flush TLB if valid PDE turns into PTE" is checked
in, this workaround is not needed.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7a3ae1e2

drm/amdgpu: flush TLB if valid PDE turns into PTE · bf546940

由 Philip Yang 提交于 5月 12, 2021

Mapping huge page, 2MB aligned address with 2MB size, uses PDE0 as PTE.
If previously valid PDE0, PDE0.V=1 and PDE0.P=0 turns into PTE, this
requires TLB flush, otherwise page table walker will not read updated
PDE0.

Change page table update mapping to return table_freed flag to indicate
the previously valid PDE may have turned into a PTE if page table is
freed.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bf546940

drm/amdgpu: re-apply "use the new cursor in the VM code" v2 · 0ccc3ccf

由 Christian König 提交于 3月 22, 2021

Now that we found the underlying problem we can re-apply this patch.

This reverts commit 6b44b667.

v2: rebase on KFD changes
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Tested-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0ccc3ccf

drm/amdgpu: Albebaran: MTYPE_NC for coarse-grain remote memory · 2b2339ee

由 Felix Kuehling 提交于 5月 10, 2021

MTYPE UC was used for a specific use case that ended up not being
implemented. Use NC for better performance for coarse-grained memory where
cache coherence during shader execution is not required.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NOak Zeng <Oak.Zeng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2b2339ee

drm/amdgpu: Arcturus: MTYPE_NC for coarse-grain remote memory · 0c6f7777

由 Felix Kuehling 提交于 5月 10, 2021

MTYPE UC was used for a specific use case that ended up not being
implemented. Use NC for better performance for coarse-grained memory where
cache coherence during shader execution is not required.
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NOak Zeng <Oak.Zeng@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0c6f7777

drm/amdkfd: refine the poison data consumption handling · e2b1f9f5

由 Dennis Li 提交于 5月 11, 2021

The user applications maybe register the KFD_EVENT_TYPE_HW_EXCEPTION and
KFD_EVENT_TYPE_MEMORY events, driver could notify them when poison data
consumed. Beside that, some applications maybe register SIGBUS signal
hander. These applications will handle poison data by themselves, exit
or re-create context to re-dispatch works.
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e2b1f9f5

drm/amdkfd: new range accessible by all GPUs · a9a76bee

由 Philip Yang 提交于 5月 05, 2021

If xnack is on, new range is created to recover retry vm fault or
created by SVM API calls, set all GPUs have access to the range.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a9a76bee

11 5月, 2021 10 次提交

drm/amdkfd: handle errors returned by svm_migrate_copy_to_vram/ram · 04fe3fd1

由 Philip Yang 提交于 4月 28, 2021

If migration copy failed because process is killed, or out of VRAM or
system memory, pass error code back to caller to handle error
gracefully.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

04fe3fd1

drm/amdgpu: Rename to ras_*_enabled · 8ab0d6f0

由 Luben Tuikov 提交于 5月 04, 2021

Rename,
  ras_hw_supported --> ras_hw_enabled, and
  ras_features     --> ras_enabled,
to show that ras_enabled is a subset of
ras_hw_enabled, which itself is a subset
of the ASIC capability.

Cc: Alexander Deucher <Alexander.Deucher@amd.com>
Cc: John Clements <john.clements@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8ab0d6f0

drm/amdkfd: add ACPI SRAT parsing for topology · ddec8d3b

由 Eric Huang 提交于 4月 23, 2021

In NPS4 BIOS we need to find the closest numa node when creating
topology io link between cpu and gpu, if PCI driver doesn't set
it.
Signed-off-by: NEric Huang <jinhuieric.huang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ddec8d3b

drm/amdkfd: Update L1 and add L2/3 cache information · 74abbded

由 Mike Li 提交于 3月 26, 2021

The L1 cache information has been updated and the L2/L3
information has been added. The changes have been made
for Vega10 and newer ASICs. There are no changes
for the older ASICs before Vega10.
Signed-off-by: NMike Li <Tianxinmike.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

74abbded

drm/amdkfd: fix no atomics settings in the kfd topology · bdd24657

由 Jonathan Kim 提交于 4月 30, 2021

To account for various PCIe and xGMI setups, check the no atomics settings
for a device in relation to every direct peer.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdd24657

drm/amdkfd: Make svm_migrate_put_sys_page static · 2e4ec251

由 Felix Kuehling 提交于 4月 30, 2021

This function is only used in this source file.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NRodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

2e4ec251

drm/amdkfd: flush TLB after updating GPU page table · 1704ac8e

由 Philip Yang 提交于 4月 15, 2021

To workaround the situation that vm retry fault keep coming after page
table update. We are investigating the root cause, but once this issue
happens, application will stuck and sometimes have to reboot to recover.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1704ac8e

drm/amdkfd: Add Aldebaran virtualization support · cecd91b4

由 Zhigang Luo 提交于 4月 29, 2021

update kfd_supported_devices to enable Aldebaran virtualization support
Signed-off-by: NZhigang Luo <zhigang.luo@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cecd91b4

drm/amdkfd: report the numa weight between host and device over xgmi · 559f418e

由 Jonathan Kim 提交于 4月 21, 2021

GPUs connected to CPUs over xGMI are bidirectional so set weight by a
single hop both ways.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Tested-by: NRamesh Errabolu <ramesh.errabolu@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

559f418e

drm/amdkfd: report atomics support in io_links over xgmi · deb68983

由 Jonathan Kim 提交于 4月 21, 2021

Link atomics support over xGMI should be reported independently of PCIe.
Do not set NO_ATOMICS flags on devices that support xGMI but that do not
have atomics support over PCIe.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Tested-by: NRamesh Errabolu <ramesh.errabolu@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

deb68983

29 4月, 2021 9 次提交

drm/amdkfd: Add Aldebaran gws support · 8baa6018

由 Harish Kasiviswanathan 提交于 4月 20, 2021

v2: updated MEC FW version after validating gws with debugger
Signed-off-by: NHarish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: NJoseph Greathouse <Joseph.Greathouse@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8baa6018

drm/amdkfd: enable subsequent retry fault · b3dc91f9

由 Philip Yang 提交于 4月 20, 2021

After draining the stale retry fault, or failed to validate the range
to recover, have to remove the fault address from fault filter ring, to
be able to handle subsequent retry interrupt on same address. Otherwise
the retry fault will not be processed to recover until timeout passed.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b3dc91f9

drm/amdkfd: handle stale retry fault · 373e3ccd

由 Philip Yang 提交于 4月 19, 2021

Retry fault interrupt maybe pending in IH ring after GPU page table
is updated to recover the vm fault, because each page of the range
generate retry fault interrupt. There is race if application unmap
range to remove and free the range first and then retry fault work
restore_pages handle the retry fault interrupt, because range can not be
found, this vm fault can not be recovered and report incorrect GPU vm
fault to application.

Before unmap to remove and free range, drain retry fault interrupt
from IH ring1 to ensure no retry fault comes after the range is removed.

Drain retry fault interrupt skip the range which is on deferred list
to remove, or the range is child range, which is split by unmap, does
not add to svms and have interval notifier.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

373e3ccd

drm/amdkfd: retry validation to recover range · 4999e398

由 Philip Yang 提交于 4月 19, 2021

GPU vm retry fault recover range need retry validation if

1. range is split in parallel by unmap while recover
2. range migrate to system memory and range is updated in system
memory while recover
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4999e398

drm/amdkfd: fix spelling mistake in packet manager · c3c5cc9a

由 Jonathan Kim 提交于 4月 26, 2021

The plural of 'process' should be 'processes'.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NFelix Kuehling <felix.kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c3c5cc9a

drm/amdkfd: fix double free device pgmap resource · c0f76fc8

由 Philip Yang 提交于 4月 26, 2021

Use devm_memunmap_pages instead of memunmap_pages to release pgmap
and remove pgmap from device action, to avoid double free pgmap when
unloading driver module.

Release device memory region if failed to create device memory pages
structure.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c0f76fc8

drm/amdkfd: Fix spelling mistake "unregisterd" -> "unregistered" · dd57e65f

由 Colin Ian King 提交于 4月 26, 2021

There is a spelling mistake in a pr_debug message. Fix it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NNirmoy Das <nirmoy.das@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

dd57e65f

drm/amdgpu: remove unnecessary header include · be9064b7

由 Hawking Zhang 提交于 4月 25, 2021

amdgpu.h is included in kfd_priv.h
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NJohn Clements <John.Clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

be9064b7

drm/amdkfd: Fix kernel-doc syntax error · 71ff0b4d

由 Fabio M. De Francesco 提交于 4月 24, 2021

Fixed a kernel-doc error in the documentation of a function.
Signed-off-by: NFabio M. De Francesco <fmdefrancesco@gmail.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

71ff0b4d

24 4月, 2021 4 次提交

drm/amdkfd: remove redundant initialization to variable r · a40eb089

由 Colin Ian King 提交于 4月 22, 2021

The variable r is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a40eb089

drm/amdkfd: fix uint32 variable compared to less than zero · 65f8db81

由 Colin Ian King 提交于 4月 22, 2021

Currently the call to kfd_process_gpuidx_from_gpuid is returning an
int value and this is being assigned to a uint32_t variable gpuidx
and this is being checked for a negative error return which is always
going to be false. Fix this by making gpuidx an int32_t. This makes
gpuidx also type consistent with the use of gpuidx from the callers.

Addresses-Coverity: ("Unsigned compared against 0")
Fixes: cda0f85b ("drm/amdkfd: refine migration policy with xnack on")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

65f8db81

drm/amdkfd: set attribute access for default ranges · 63f1af83

由 Alex Sierra 提交于 4月 21, 2021

Attribute access value for default ranges is set, based on
process xnack on/off.
XNACK ON has GPU access attribute for unregistered ranges through page
fault. While XNACK OFF has no access attribute for unregistered ranges.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

63f1af83

drm/amdkfd: svm ranges creation for unregistered memory · b19dbb7a

由 Alex Sierra 提交于 4月 12, 2021

SVM ranges are created for unregistered memory, triggered
by page faults. These ranges are migrated/mapped to
GPU VRAM memory.
Signed-off-by: NAlex Sierra <alex.sierra@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b19dbb7a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功