1. 15 10月, 2021 1 次提交
  2. 15 6月, 2021 1 次提交
    • L
      drm/amd/amdgpu: fix a potential deadlock in gpu reset · b65aa179
      Lang Yu 提交于
      stable inclusion
      from stable-5.10.42
      commit 4951dd498d483fa961c92541b55ffb32db7f2dbf
      bugzilla: 55093
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 9c2876d5 ]
      
      When amdgpu_ib_ring_tests failed, the reset logic called
      amdgpu_device_ip_suspend twice, then deadlock occurred.
      Deadlock log:
      
      [  805.655192] amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
      [  806.290952] [drm] free PSP TMR buffer
      
      [  806.319406] ============================================
      [  806.320315] WARNING: possible recursive locking detected
      [  806.321225] 5.11.0-custom #1 Tainted: G        W  OEL
      [  806.322135] --------------------------------------------
      [  806.323043] cat/2593 is trying to acquire lock:
      [  806.323825] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.325668]
                     but task is already holding lock:
      [  806.326664] ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.328430]
                     other info that might help us debug this:
      [  806.329539]  Possible unsafe locking scenario:
      
      [  806.330549]        CPU0
      [  806.330983]        ----
      [  806.331416]   lock(&adev->dm.dc_lock);
      [  806.332086]   lock(&adev->dm.dc_lock);
      [  806.332738]
                      *** DEADLOCK ***
      
      [  806.333747]  May be due to missing lock nesting notation
      
      [  806.334899] 3 locks held by cat/2593:
      [  806.335537]  #0: ffff888100d3f1b8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110
      [  806.337009]  #1: ffff888136b1fd78 (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
      [  806.339018]  #2: ffff888136b1cdc8 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.340869]
                     stack backtrace:
      [  806.341621] CPU: 6 PID: 2593 Comm: cat Tainted: G        W  OEL    5.11.0-custom #1
      [  806.342921] Hardware name: AMD Celadon-CZN/Celadon-CZN, BIOS WLD0C23N_Weekly_20_12_2 12/23/2020
      [  806.344413] Call Trace:
      [  806.344849]  dump_stack+0x93/0xbd
      [  806.345435]  __lock_acquire.cold+0x18a/0x2cf
      [  806.346179]  lock_acquire+0xca/0x390
      [  806.346807]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.347813]  __mutex_lock+0x9b/0x930
      [  806.348454]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.349434]  ? amdgpu_device_indirect_rreg+0x58/0x70 [amdgpu]
      [  806.350581]  ? _raw_spin_unlock_irqrestore+0x47/0x50
      [  806.351437]  ? dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.352437]  ? rcu_read_lock_sched_held+0x4f/0x80
      [  806.353252]  ? rcu_read_lock_sched_held+0x4f/0x80
      [  806.354064]  mutex_lock_nested+0x1b/0x20
      [  806.354747]  ? mutex_lock_nested+0x1b/0x20
      [  806.355457]  dm_suspend+0xb8/0x1d0 [amdgpu]
      [  806.356427]  ? soc15_common_set_clockgating_state+0x17d/0x19 [amdgpu]
      [  806.357736]  amdgpu_device_ip_suspend_phase1+0x78/0xd0 [amdgpu]
      [  806.360394]  amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
      [  806.362926]  amdgpu_device_pre_asic_reset+0xb3/0x270 [amdgpu]
      [  806.365560]  amdgpu_device_gpu_recover.cold+0x679/0x8eb [amdgpu]
      Signed-off-by: NLang Yu <Lang.Yu@amd.com>
      Acked-by: NChristian KÃnig <christian.koenig@amd.com>
      Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b65aa179
  3. 03 6月, 2021 1 次提交
  4. 08 2月, 2021 1 次提交
  5. 28 1月, 2021 2 次提交
  6. 25 11月, 2020 1 次提交
  7. 04 11月, 2020 1 次提交
  8. 29 10月, 2020 2 次提交
    • M
      amdgpu: fix a few kernel-doc markup issues · b28d70c6
      Mauro Carvalho Chehab 提交于
      A kernel-doc markup can't be mixed with a random comment,
      as it causes parsing problems.
      
      While here, change an invalid kernel-doc markup into
      a common comment.
      Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Link: https://lore.kernel.org/r/e899f50404e94ac9a7c3267dd34f951c1a44fb2b.1603791716.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>
      b28d70c6
    • M
      drm: amdgpu: kernel-doc: update some adev parameters · ca766ff0
      Mauro Carvalho Chehab 提交于
      Running "make htmldocs: produce lots of warnings on those files:
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:177: warning: Excess function parameter 'p_size' description in 'amdgpu_vram_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c:211: warning: Excess function parameter 'man' description in 'amdgpu_vram_mgr_fini'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:90: warning: Excess function parameter 'p_size' description in 'amdgpu_gtt_mgr_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c:134: warning: Excess function parameter 'man' description in 'amdgpu_gtt_mgr_fini'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
      	./drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:675: warning: Excess function parameter 'dev' description in 'amdgpu_device_asic_init'
      
      They're related to the repacement of some parameters by adev,
      and due to a few renamed parameters.
      
      While here, uniform the name of the parameter for it to be
      the same on all functions using a pointer to struct amdgpu_device.
      
      Update the kernel-doc documentation accordingly.
      Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Link: https://lore.kernel.org/r/5755c2b361890b8ae5cea0f61dfd70b1c135eefe.1603791716.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>
      ca766ff0
  9. 22 10月, 2020 1 次提交
  10. 10 10月, 2020 1 次提交
  11. 01 10月, 2020 2 次提交
  12. 30 9月, 2020 1 次提交
  13. 26 9月, 2020 4 次提交
  14. 18 9月, 2020 2 次提交
  15. 16 9月, 2020 8 次提交
  16. 04 9月, 2020 2 次提交
  17. 29 8月, 2020 1 次提交
    • N
      drm/amdgpu: fix compiler warnings · e230ac11
      Nirmoy Das 提交于
      Fixes below compiler warnings:
       CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_device.o
      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘static’ is not at beginning of declaration [-Wold-style-declaration]
        381 | void static inline amdgpu_mm_wreg_mmio(struct amdgpu_device *adev, uint32_t reg, uint32_t v, uint32_t acc_flags)
            | ^~~~
      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:381:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c: In function ‘amdgpu_device_fini’:
      drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:3381:6: warning: variable ‘r’ set but not used [-Wunused-but-set-variable]
       3381 |  int r;
            |      ^
      Signed-off-by: NNirmoy Das <nirmoy.das@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      e230ac11
  18. 27 8月, 2020 4 次提交
  19. 25 8月, 2020 4 次提交
    • L
      drm/amdgpu: Get DRM dev from adev by inline-f · 4a580877
      Luben Tuikov 提交于
      Add a static inline adev_to_drm() to obtain
      the DRM device pointer from an amdgpu_device pointer.
      Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      4a580877
    • L
      drm/amdgpu: drm_device to amdgpu_device by inline-f (v2) · 1348969a
      Luben Tuikov 提交于
      Get the amdgpu_device from the DRM device by use
      of an inline function, drm_to_adev(). The inline
      function resolves a pointer to struct drm_device
      to a pointer to struct amdgpu_device.
      
      v2: Use a typed visible static inline function
          instead of an invisible macro.
      Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      1348969a
    • D
      drm/amdgpu: annotate a false positive recursive locking · 08ebb485
      Dennis Li 提交于
      Re-apply commit 72e14ebf
      
      [  584.110304] ============================================
      [  584.110590] WARNING: possible recursive locking detected
      [  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G           OE
      [  584.111164] --------------------------------------------
      [  584.111456] kworker/38:1/553 is trying to acquire lock:
      [  584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
      [  584.112112]
                     but task is already holding lock:
      [  584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
      [  584.113068]
                     other info that might help us debug this:
      [  584.113689]  Possible unsafe locking scenario:
      
      [  584.114350]        CPU0
      [  584.114685]        ----
      [  584.115014]   lock(&adev->reset_sem);
      [  584.115349]   lock(&adev->reset_sem);
      [  584.115678]
                      *** DEADLOCK ***
      
      [  584.116624]  May be due to missing lock nesting notation
      
      [  584.117284] 4 locks held by kworker/38:1/553:
      [  584.117616]  #0: ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630
      [  584.117967]  #1: ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630
      [  584.118358]  #2: ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu]
      [  584.118786]  #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
      [  584.119222]
                     stack backtrace:
      [  584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G           OE     5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
      [  584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [  584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
      [  584.121638] Call Trace:
      [  584.122050]  dump_stack+0x98/0xd5
      [  584.122499]  __lock_acquire+0x1139/0x16e0
      [  584.122931]  ? trace_hardirqs_on+0x3b/0xf0
      [  584.123358]  ? cancel_delayed_work+0xa6/0xc0
      [  584.123771]  lock_acquire+0xb8/0x1c0
      [  584.124197]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
      [  584.124599]  down_write+0x49/0x120
      [  584.125032]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
      [  584.125472]  amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu]
      [  584.125910]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu]
      [  584.126367]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu]
      [  584.126789]  process_one_work+0x29e/0x630
      [  584.127208]  worker_thread+0x3c/0x3f0
      [  584.127621]  ? __kthread_parkme+0x61/0x90
      [  584.128014]  kthread+0x12f/0x150
      [  584.128402]  ? process_one_work+0x630/0x630
      [  584.128790]  ? kthread_park+0x90/0x90
      [  584.129174]  ret_from_fork+0x3a/0x50
      
      Each adev has owned lock_class_key to avoid false positive
      recursive locking.
      
      v2:
      1. register adev->lock_key into lockdep, otherwise lockdep will
      report the below warning
      
      [ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
      [ 1216.705924] ------------[ cut here ]------------
      [ 1216.705972] DEBUG_LOCKS_WARN_ON(1)
      [ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210
      
      v3:
      change to use down_write_nest_lock to annotate the false dead-lock
      warning.
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      08ebb485
    • D
      drm/amdgpu: refine create and release logic of hive info · d95e8e97
      Dennis Li 提交于
      Change to dynamically create and release hive info object,
      which help driver support more hives in the future.
      
      v2:
      Change to save hive object pointer in adev, to avoid locking
      xgmi_mutex every time when calling amdgpu_get_xgmi_hive.
      
      v3:
      1. Change type of hive object pointer in adev from void* to
      amdgpu_hive_info*.
      2. remove unnecessary variable initialization.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      d95e8e97