提交 · 455d40c92713e9478ea9da057bf07719787c1c03 · openeuler / Kernel

06 1月, 2021 2 次提交

drm/amdgpu: switch hdp callback functions for hdp v4 · 455d40c9

由 Likun Gao 提交于 12月 28, 2020

Switch to use the HDP functions which unified on hdp structure instead of
the scattered hdp callback functions.
V2: clean up hdp reset ras error count function.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

455d40c9

drm/amdgpu: add amdgpu_hdp structure · b291a387

由 Hawking Zhang 提交于 12月 24, 2020

amdgpu_hdp hold all the callbacks for hdp
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NLikun Gao <Likun.Gao@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b291a387

16 12月, 2020 3 次提交

drm/amdgpu: add check for ACPI power resources · b10c1c5b

由 Alex Deucher 提交于 12月 09, 2020

Check if the device has ACPI power resources so we can
enable runtime pm if so.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b10c1c5b

drm/amdgpu: split BOCO and ATPX handling · fd496ca8

由 Alex Deucher 提交于 12月 09, 2020

In preparation for systems that support d3cold on dGPUs
independent of PX/HG.  No functional change intended.
Acked-by: NEvan Quan <evan.quan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fd496ca8

drm/amdgpu: add judgement for suspend/resume sequence · 9ca5b8a1

由 Likun Gao 提交于 12月 15, 2020

S0ix only makes sense on APUs since they are part of the platform, so
only when the ASIC is APU should set amdgpu_acpi_is_s0ix_supported flag
to deal with the related situation.
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9ca5b8a1

14 11月, 2020 1 次提交

drm/amdgpu: add s0i3 capacity check for s0i3 routine (v2) · 4cd078dc

由 Prike Liang 提交于 9月 09, 2020

add amdgpu_acpi_is_s0ix_supported() to check the platform
whether support s0i3.

v2: fix empty function parameters warning (void)
Signed-off-by: NPrike Liang <Prike.Liang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4cd078dc

13 11月, 2020 2 次提交

drm/amdgpu: add amdgpu_smuio structure · 293f2563

由 Hawking Zhang 提交于 11月 07, 2020

Add amdgpu_smuio structure in amdgpu_device to
provide various callback functions to support
smuio ip funcitonality
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NJohn Clements <john.clements@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

293f2563

gpu: drm: amd: amdgpu: amdgpu: Mark global variables as __maybe_unused · 02f40f82

由 Lee Jones 提交于 11月 05, 2020

These 3 variables are used in *some* sourcefiles which include
amdgpu.h, but not *all*.  This leads to a flurry of build warnings.

Fixes the following W=1 kernel build warning(s):

 from drivers/gpu/drm/amd/amdgpu/amdgpu.h:67,
 drivers/gpu/drm/amd/amdgpu/amdgpu.h:198:19: warning: ‘no_system_mem_limit’ defined but not used [-Wunused-const-variable=]
 drivers/gpu/drm/amd/amdgpu/amdgpu.h:197:19: warning: ‘debug_evictions’ defined but not used [-Wunused-const-variable=]
 drivers/gpu/drm/amd/amdgpu/amdgpu.h:196:18: warning: ‘sched_policy’ defined but not used [-Wunused-const-variable=]

 NB: Repeats ~650 times - snipped for brevity.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

02f40f82

27 10月, 2020 1 次提交

drm/amd/pm: correct the checks for polaris kickers · 73275181

由 Evan Quan 提交于 9月 25, 2020

By defining new Macros.
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

73275181

16 10月, 2020 1 次提交

drm/amd/pm: correct gfx and pcie settings on umd pstate switching(V2) · f2b75bc2

由 Evan Quan 提交于 8月 18, 2020

For entering UMD stable Pstate, the operations to enter rlc_safe
mode, disable mgcg_perfmon and disable PCIE aspm are needed. And
the opposite operations should be performed on UMD stable Pstate
exiting.

V2: take those ASICs(CI/SI/VI) which may not support this into
    consideration
Signed-off-by: NEvan Quan <evan.quan@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f2b75bc2

08 10月, 2020 1 次提交

drm/amdgpu: add per device user friendly xgmi events for vega20 · b4a7db71

由 Jonathan Kim 提交于 9月 01, 2020

Non-outbound data metrics are non useful so mark them as legacy.
Bucket new perf counters into device and not device ip.
Bind events to chip instead of IP.
Report available event counters and not number of hw counter banks.
Move DF public macros to private since not needed outside of IP version.

v5: cleanup by moving per chip configs into structs

v4: After more discussion, replace *_LEGACY references with IP references
to indicate concept of pmu-typed versus event-config-typed event
registration.

v3: attr groups const array is global but attr groups are allocated per
device which doesn't work and causes problems on memory allocation and
de-allocation for pmu unregister. Switch to building const attr groups
per pmu instead to simplify solution.

v2: add comments on sysfs structure and formatting.
Signed-off-by: NJonathan Kim <jonathan.kim@amd.com>
Reviewed-by: NHarish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b4a7db71

01 10月, 2020 3 次提交

drm/amdgpu: support indirect access reg outside of mmio bar (v2) · f7ee1874

由 Hawking Zhang 提交于 9月 18, 2020

support both direct and indirect accessor in unified
helper functions.

v2: Retire indirect mmio access via mm_index/data
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f7ee1874

drm/amdgpu: add helper function for indirect reg access (v3) · 1bba3683

由 Hawking Zhang 提交于 9月 17, 2020

Add helper function in order to remove RREG32/WREG32
in current pcie_rreg/wreg function for soc15 and
onwards adapters.
PCIE_INDEX/DATA pairs are used to access regsiters
outside of mmio bar in the helper functions.
The new helper functions help remove the recursion
of amdgpu_mm_rreg/wreg from pcie_rreg/wreg and
provide the oppotunity to centralize direct and
indirect access in a single function.

v2: Fixed typo and refine the comments

v3: Remove unnecessary volatile local variable
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NKevin Wang <kevin1.wang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1bba3683

drm/amdgpu: use function pointer for gfxhub functions · 8ffff9b4

由 Oak Zeng 提交于 9月 17, 2020

gfxhub functions are now called from function pointers,
instead of from asic-specific functions.
Signed-off-by: NOak Zeng <Oak.Zeng@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8ffff9b4

16 9月, 2020 3 次提交

drm/amdgpu: Fix consecutive DPC recovery failures. · c1dd4aa6

由 Andrey Grodzovsky 提交于 8月 24, 2020

Cache the PCI state on boot and before each case where we might
loose it.

v2: Add pci_restore_state while caching the PCI state to avoid
breaking PCI core logic for stuff like suspend/resume.

v3: Extract pci_restore_state from amdgpu_device_cache_pci_state
to avoid superflous restores during GPU resets and suspend/resumes.

v4: Style fixes.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c1dd4aa6

drm/amdgpu: Avoid accessing HW when suspending SW state · bf36b52e

由 Andrey Grodzovsky 提交于 7月 29, 2020

At this point the ASIC is already post reset by the HW/PSP
so the HW not in proper state to be configured for suspension,
some blocks might be even gated and so best is to avoid touching it.

v2: Rename in_dpc to more meaningful name
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bf36b52e

drm/amdgpu: Implement DPC recovery · c9a6b82f

由 Andrey Grodzovsky 提交于 7月 29, 2020

Add PCI Downstream Port Containment (DPC) with
basic recovery functionality

v2: remove pci_save_state to avoid breaking suspend/resume
v3: Fix style comments
v4: Improve description.
Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c9a6b82f

27 8月, 2020 3 次提交

drm/amdkfd: fix set kfd node ras properties value · 5436ab94

由 Stanley.Yang 提交于 8月 17, 2020

The ctx->features are new RAS implementation which
is only available for Vega20 and onwards, it is not
available for vega10, vega10 should follow legacy
ECC implementation.

Changed from V1:
    wrap function to initialize kfd node properties

Changed from V2:
    remove wrap function and SDMA SRAM ECC check
Signed-off-by: NStanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: NGuchun Chen <guchun.chen@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

5436ab94

drm/amdgpu: add an asic callback for pre asic init · 9737a923

由 Alex Deucher 提交于 8月 19, 2020

This callback can be used by asics that need to
do something special prior to calling atom asic init.
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9737a923

drm/amdgpu: Embed drm_device into amdgpu_device (v3) · 8aba21b7

由 Luben Tuikov 提交于 8月 14, 2020

a) Embed struct drm_device into struct amdgpu_device.
b) Modify the inline-f drm_to_adev() accordingly.
c) Modify the inline-f adev_to_drm() accordingly.
d) Eliminate the use of drm_device.dev_private,
   in amdgpu.
e) Switch from using drm_dev_alloc() to
   drm_dev_init().
f) Add a DRM driver release function, which frees
   the container amdgpu_device after all krefs on
   the contained drm_device have been released.

v2: Split out adding adev_to_drm() into its own
    patch (previous commit), making this patch
    more succinct and clear. More detailed commit
    description.
v3: squash in fix to call drmm_add_final_kfree()
    to avoid a warning.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

8aba21b7

25 8月, 2020 5 次提交

drm/amdgpu: Get DRM dev from adev by inline-f · 4a580877

由 Luben Tuikov 提交于 8月 24, 2020

Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

4a580877

drm/amdgpu: drm_device to amdgpu_device by inline-f (v2) · 1348969a

由 Luben Tuikov 提交于 8月 24, 2020

Get the amdgpu_device from the DRM device by use
of an inline function, drm_to_adev(). The inline
function resolves a pointer to struct drm_device
to a pointer to struct amdgpu_device.

v2: Use a typed visible static inline function
    instead of an invisible macro.
Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1348969a

drm/amdgpu: refine create and release logic of hive info · d95e8e97

由 Dennis Li 提交于 8月 18, 2020

Change to dynamically create and release hive info object,
which help driver support more hives in the future.

v2:
Change to save hive object pointer in adev, to avoid locking
xgmi_mutex every time when calling amdgpu_get_xgmi_hive.

v3:
1. Change type of hive object pointer in adev from void* to
amdgpu_hive_info*.
2. remove unnecessary variable initialization.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d95e8e97

drm/amdgpu: change reset lock from mutex to rw_semaphore · 6049db43

由 Dennis Li 提交于 8月 20, 2020

clients don't need reset-lock for synchronization when no
GPU recovery.

v2:
change to return the return value of down_read_killable.

v3:
if GPU recovery begin, VF ignore FLR notification.
Reviewed-by: NMonk Liu <monk.liu@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6049db43

drm/amdgpu: refine codes to avoid reentering GPU recovery · 53b3f8f4

由 Dennis Li 提交于 8月 19, 2020

if other threads have holden the reset lock, recovery will
fail to try_lock. Therefore we introduce atomic hive->in_reset
and adev->in_gpu_reset, to avoid reentering GPU recovery.

v2:
drop "? true : false" in the definition of amdgpu_in_reset
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

53b3f8f4

19 8月, 2020 1 次提交

drm/amdgpu: Fix repeatly flr issue · 9a1cddd6

由 jqdeng 提交于 8月 07, 2020

Only for no job running test case need to do recover in
flr notification.
For having job in mirror list, then let guest driver to
hit job timeout, and then do recover.
Signed-off-by: Njqdeng <Emily.Deng@amd.com>
Acked-by: NNirmoy Das <nirmoy.das@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9a1cddd6

15 8月, 2020 1 次提交

drm/amdgpu: revert "fix system hang issue during GPU reset" · f1403342

由 Christian König 提交于 8月 12, 2020

The whole approach wasn't thought through till the end.

We already had a reset lock like this in the past and it caused the same problems like this one.

Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary.

This reverts commit df9c8d1a.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f1403342

07 8月, 2020 1 次提交

drm/amdkfd: option to disable system mem limit · b80f050f

由 Philip Yang 提交于 7月 27, 2020

If multiple process share system memory through /dev/shm, KFD allocate
memory should not fail if it reaches the system memory limit because
one copy of physical system memory are shared by multiple process.

Add module parameter no_system_mem_limit to provide user option to
disable system memory limit check at runtime using sysfs or during
driver module init using kernel boot argument. By default the system
memory limit is on.

Print out debug message to warn user if KFD allocate memory failed
because system memory reaches limit.
Signed-off-by: NPhilip Yang <Philip.Yang@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b80f050f

05 8月, 2020 6 次提交

drm/amdgpu: move vram usage by vbios to mman (v2) · 87ded5ca

由 Alex Deucher 提交于 7月 29, 2020

It's related to the memory manager so move it there.

v2: inline the structure
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

87ded5ca

drm/amdgpu: move IP discovery data to mman · 72de33f8

由 Alex Deucher 提交于 7月 29, 2020

It's related to the memory manager so move it there.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

72de33f8

drm/amdgpu: move stolen vga bo from amdgpu to amdgpu.gmc · fcbc92e2

由 Alex Deucher 提交于 7月 28, 2020

Since that is where we store the other data related to
the stolen vga memory.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fcbc92e2

drm/amdgpu: use a define for the memory size of the vga emulator · 81b54fb7

由 Alex Deucher 提交于 7月 28, 2020

Rather than open coding it everywhere.
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

81b54fb7

drm/amdgpu: introduce a new parameter to configure how many KCQ we want(v5) · a300de40

由 Monk Liu 提交于 7月 27, 2020

what:
the MQD's save and restore of KCQ (kernel compute queue)
cost lots of clocks during world switch which impacts a lot
to multi-VF performance

how:
introduce a paramter to control the number of KCQ to avoid
performance drop if there is no kernel compute queue needed

notes:
this paramter only affects gfx 8/9/10

v2:
refine namings

v3:
choose queues for each ring to that try best to cross pipes evenly.

v4:
fix indentation
some cleanupsin the gfx_compute_queue_acquire()

v5:
further fix on indentations
more cleanupsin gfx_compute_queue_acquire()

TODO:
in the future we will let hypervisor driver to set this paramter
automatically thus no need for user to configure it through
modprobe in virtual machine
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a300de40

drm/amdgpu: add bad page count threshold in module parameter(v3) · acc0204c

由 Guchun Chen 提交于 7月 21, 2020

bad_page_threshold could be configured to enable/disable the
associated bad page retirement feature in RAS.

When it's -1, ras will use typical bad page failure value to
handle bad page retirement.

When it's 0, disable bad page retirement, and no bad page
will be recorded and saved.

For other valid value, driver will use this manual value
as the threshold value of totoal bad pages.

v2: correct documentation of this parameter.

v3: remove confused statement in documentation.
Signed-off-by: NGuchun Chen <guchun.chen@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

acc0204c

28 7月, 2020 1 次提交

drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a

由 Dennis Li 提交于 7月 08, 2020

when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
re-entering GPU recovery.

During GPU reset and resume, it is unsafe that other threads access GPU,
which maybe cause GPU reset failed. Therefore the new rw_semaphore
adev->reset_sem is introduced, which protect GPU from being accessed by
external threads during recovery.

v2:
1. add rwlock for some ioctls, debugfs and file-close function.
2. change to use dqm->is_resetting and dqm_lock for protection in kfd
driver.
3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
re-enter GPU recovery for the same GPU hang.

v3:
1. change back to use adev->reset_sem to protect kfd callback
functions, because dqm_lock couldn't protect all codes, for example:
free_mqd must be called outside of dqm_lock;

[ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
[ 1230.177221] Call Trace:
[ 1230.178249]  dump_stack+0x98/0xd5
[ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
[ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
[ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
[ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
[ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
[ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
[ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
[ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
[ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
[ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
[ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
[ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
[ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
[ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
[ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
[ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
[ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
[ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
[ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
[ 1230.202831]  ksys_ioctl+0x98/0xb0
[ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
[ 1230.205174]  do_syscall_64+0x5f/0x250
[ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

2. remove try_lock and introduce atomic hive->in_reset, to avoid
re-enter GPU recovery.

v4:
1. remove an unnecessary whitespace change in kfd_chardev.c
2. remove comment codes in amdgpu_device.c
3. add more detailed comment in commit message
4. define a wrap function amdgpu_in_reset

v5:
1. Fix some style issues.
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: NChristian König <christian.koenig@amd.com>
Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
Signed-off-by: NDennis Li <Dennis.Li@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

df9c8d1a

16 7月, 2020 1 次提交

drm/amdgpu: add module parameter choose reset mode · 273da6ff

由 Wenhui Sheng 提交于 7月 14, 2020

Default value is auto, doesn't change
original reset method logic.

v2: change to use parameter reset_method
v3: add warn msg if specified mode isn't supported
Signed-off-by: NLikun Gao <Likun.Gao@amd.com>
Signed-off-by: NWenhui Sheng <Wenhui.Sheng@amd.com>
Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

273da6ff

03 7月, 2020 2 次提交

Revert "drm/amdgpu: support access regs outside of mmio bar" · e78b579d

由 Hawking Zhang 提交于 6月 30, 2020

This reverts commit 2eee0229.
Fallback to a stable base until we have a correct new one
Signed-off-by: NHawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e78b579d

drm/amdgpu: SI support for VCE clock control · fb40bceb

由 Alex Jivin 提交于 6月 24, 2020

Port functionality from the Radeon driver to support
VCE clock control.
Signed-off-by: NAlex Jivin <alex.jivin@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

fb40bceb

01 7月, 2020 2 次提交

drm/amdkfd: Add eviction debug messages · b2057956

由 Felix Kuehling 提交于 6月 11, 2020

Use WARN to print messages with backtrace when evictions are triggered.
This can help determine the root cause of evictions and help spot driver
bugs triggering evictions unintentionally, or help with performance tuning
by avoiding conditions that cause evictions in a specific workload.

The messages are controlled by a new module parameter that can be changed
at runtime:

  echo Y > /sys/module/amdgpu/parameters/debug_evictions
  echo N > /sys/module/amdgpu/parameters/debug_evictions
Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: NPhilip Yang <Philip.Yang@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b2057956

drm/amdgpu: Fix a buffer overflow handling the serial number · 8df1a28f

由 Dan Carpenter 提交于 6月 10, 2020

The comments say that the serial number is a 16-digit HEX string so the
buffer needs to be at least 17 characters to hold the NUL terminator.

The other issue is that "size" returned from sprintf() is the number of
characters before the NUL terminator so the memcpy() wasn't copying the
terminator. The serial number needs to be NUL terminated so that it
doesn't lead to a read overflow in amdgpu_device_get_serial_number().
Also it's just cleaner and faster to sprintf() directly to adev->serial[]
instead of using a temporary buffer.

Fixes: 81a16241 ("drm/amdgpu: Add unique_id and serial_number for Arcturus v3")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NEvan Quan <evan.quan@amd.com>

8df1a28f

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功