1. 21 5月, 2021 37 次提交
  2. 20 5月, 2021 3 次提交
    • C
      drm/amdgpu: stop touching sched.ready in the backend · 81db370c
      Christian König 提交于
      This unfortunately comes up in regular intervals and breaks
      GPU reset for the engine in question.
      
      The sched.ready flag controls if an engine can't get working
      during hw_init, but should never be set to false during hw_fini.
      
      v2: squash in unused variable fix (Alex)
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      81db370c
    • L
      drm/amd/amdgpu: fix a potential deadlock in gpu reset · 6e8bcdd6
      Lang Yu 提交于
      When amdgpu_ib_ring_tests failed, the reset logic called
      amdgpu_device_ip_suspend twice, then deadlock occurred.
      Deadlock log:
      
      [  805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
      [  806.290952] [drm] free PSP TMR buffer
      
      [  806.319406] ============================================
      [  806.320315] WARNING: possible recursive locking detected
      [  806.321225] 5.11.0-custom #1 Tainted: G        W  OEL
      [  806.322135] --------------------------------------------
      [  806.323043] cat/2593 is trying to acquire lock:
      [  806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.325668]
                     but task is already holding lock:
      [  806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.328430]
                     other info that might help us debug this:
      [  806.329539]  Possible unsafe locking scenario:
      
      [  806.330549]        CPU0
      [  806.330983]        ----
      [  806.331416]   lock(&adev->dm.dc_lock);
      [  806.332086]   lock(&adev->dm.dc_lock);
      [  806.332738]
                      *** DEADLOCK ***
      
      [  806.333747]  May be due to missing lock nesting notation
      
      [  806.334899] 3 locks held by cat/2593:
      [  806.335537]  #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
      [  806.337009]  #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
      [  806.339018]  #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.340869]
                     stack backtrace:
      [  806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G        W  OEL    5.11.0-custom #1
      [  806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
      [  806.344413] Call Trace:
      [  806.344849]  dump_stack+0x93/0xbd
      [  806.345435]  __lock_acquire.cold+0x18a/0x2cf
      [  806.346179]  lock_acquire+0xca/0x390
      [  806.346807]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.347813]  __mutex_lock+0x9b/0x930
      [  806.348454]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.349434]  ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
      [  806.350581]  ? _raw_spin_unlock_irqrestore+0x47/0x50
      [  806.351437]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.352437]  ? rcu_read_lock_sched_held+0x4f/0x80
      [  806.353252]  ? rcu_read_lock_sched_held+0x4f/0x80
      [  806.354064]  mutex_lock_nested+0x1b/0x20
      [  806.354747]  ? mutex_lock_nested+0x1b/0x20
      [  806.355457]  dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.356427]  ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
      [  806.357736]  amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
      [  806.360394]  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
      [  806.362926]  amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
      [  806.365560]  amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]
      Signed-off-by: NLang Yu <Lang.Yu@amd.com>
      Acked-by: NChristian KÃnig <christian.koenig@amd.com>
      Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      6e8bcdd6
    • A
      drm/amdgpu: modify system reference clock source for navi+ (V2) · 9a530062
      Aaron Liu 提交于
      Starting from Navi+, the rlc reference clock is used for system clock
      from vbios gfx_info table. It is incorrect to use core_refclk_10khz of
      vbios smu_info table as system clock.
      Signed-off-by: NAaron Liu <aaron.liu@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Acked-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      9a530062