1. 24 3月, 2021 3 次提交
  2. 14 1月, 2021 1 次提交
  3. 07 1月, 2021 2 次提交
    • J
      drm/amdgpu: fix potential memory leak during navi12 deinitialization · e6d5c64e
      Jiawei Gu 提交于
      Navi12 HDCP & DTM deinitialization needs continue to free bo if already
      created though initialized flag is not set.
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NJiawei Gu <Jiawei.Gu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      e6d5c64e
    • D
      drm/amdgpu: fix a memory protection fault when remove amdgpu device · 9a029a3f
      Dennis Li 提交于
      ASD and TA share the same firmware in SIENNA_CICHLID and only TA
      firmware is requested during boot, so only need release TA firmware when
      remove device.
      
      [   83.877150] general protection fault, probably for non-canonical address 0x1269f97e6ed04095: 0000 [#1] SMP PTI
      [   83.888076] CPU: 0 PID: 1312 Comm: modprobe Tainted: G        W  OE     5.9.0-rc5-deli-amd-vangogh-0.0.6.6-114-gdd99d5669a96-dirty #2
      [   83.901160] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
      [   83.912353] RIP: 0010:free_fw_priv+0xc/0x120
      [   83.917531] Code: e8 99 cd b0 ff b8 a1 ff ff ff eb 9f 4c 89 f7 e8 8a cd b0 ff b8 f4 ff ff ff eb 90 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <4c> 8b 67 18 48 89 fb 4c 89 e7 e8 45 94 41 00 b8 ff ff ff ff f0 0f
      [   83.937576] RSP: 0018:ffffbc34c13a3ce0 EFLAGS: 00010206
      [   83.943699] RAX: ffffffffbb681850 RBX: ffffa047f117eb60 RCX: 0000000080800055
      [   83.951879] RDX: ffffbc34c1d5f000 RSI: 0000000080800055 RDI: 1269f97e6ed04095
      [   83.959955] RBP: ffffbc34c13a3cf0 R08: 0000000000000000 R09: 0000000000000001
      [   83.968107] R10: ffffbc34c13a3cc8 R11: 00000000ffffff00 R12: ffffa047d6b23378
      [   83.976166] R13: ffffa047d6b23338 R14: ffffa047d6b240c8 R15: 0000000000000000
      [   83.984295] FS:  00007f74f6712540(0000) GS:ffffa047fbe00000(0000) knlGS:0000000000000000
      [   83.993323] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   84.000056] CR2: 0000556a1cca4e18 CR3: 000000021faa8004 CR4: 00000000003706f0
      [   84.008128] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   84.016155] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   84.024174] Call Trace:
      [   84.027514]  release_firmware.part.11+0x4b/0x70
      [   84.033017]  release_firmware+0x13/0x20
      [   84.037803]  psp_sw_fini+0x77/0xb0 [amdgpu]
      [   84.042857]  amdgpu_device_fini+0x38c/0x5d0 [amdgpu]
      [   84.048815]  amdgpu_driver_unload_kms+0x43/0x70 [amdgpu]
      [   84.055055]  drm_dev_unregister+0x73/0xb0 [drm]
      [   84.060499]  drm_dev_unplug+0x28/0x30 [drm]
      [   84.065598]  amdgpu_dev_uninit+0x1b/0x40 [amdgpu]
      [   84.071223]  amdgpu_pci_remove+0x4e/0x70 [amdgpu]
      [   84.076835]  pci_device_remove+0x3e/0xc0
      [   84.081609]  device_release_driver_internal+0xfb/0x1c0
      [   84.087558]  driver_detach+0x4d/0xa0
      [   84.092041]  bus_remove_driver+0x5f/0xe0
      [   84.096854]  driver_unregister+0x2f/0x50
      [   84.101594]  pci_unregister_driver+0x22/0xa0
      [   84.106806]  amdgpu_exit+0x15/0x2b [amdgpu]
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      9a029a3f
  4. 06 1月, 2021 3 次提交
    • J
      drm/amdgpu: fix potential memory leak during navi12 deinitialization · 0d232dad
      Jiawei Gu 提交于
      Navi12 HDCP & DTM deinitialization needs continue to free bo if already
      created though initialized flag is not set.
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NJiawei Gu <Jiawei.Gu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      0d232dad
    • P
      drm/amdgpu: do optimization for psp command submit · 57995aa8
      pengzhou 提交于
      In the psp command submit logic,
      the function msleep(1) delayed too long,
      Changing it to usleep_range(10, 100) to
      have a better performance.
      Signed-off-by: NPeng Ju Zhou <PengJu.Zhou@amd.com>
      Reviewed-by: NEmily.Deng <Emily.Deng@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      57995aa8
    • D
      drm/amdgpu: fix a memory protection fault when remove amdgpu device · eb5f4f46
      Dennis Li 提交于
      ASD and TA share the same firmware in SIENNA_CICHLID and only TA
      firmware is requested during boot, so only need release TA firmware when
      remove device.
      
      [   83.877150] general protection fault, probably for non-canonical address 0x1269f97e6ed04095: 0000 [#1] SMP PTI
      [   83.888076] CPU: 0 PID: 1312 Comm: modprobe Tainted: G        W  OE     5.9.0-rc5-deli-amd-vangogh-0.0.6.6-114-gdd99d5669a96-dirty #2
      [   83.901160] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018
      [   83.912353] RIP: 0010:free_fw_priv+0xc/0x120
      [   83.917531] Code: e8 99 cd b0 ff b8 a1 ff ff ff eb 9f 4c 89 f7 e8 8a cd b0 ff b8 f4 ff ff ff eb 90 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <4c> 8b 67 18 48 89 fb 4c 89 e7 e8 45 94 41 00 b8 ff ff ff ff f0 0f
      [   83.937576] RSP: 0018:ffffbc34c13a3ce0 EFLAGS: 00010206
      [   83.943699] RAX: ffffffffbb681850 RBX: ffffa047f117eb60 RCX: 0000000080800055
      [   83.951879] RDX: ffffbc34c1d5f000 RSI: 0000000080800055 RDI: 1269f97e6ed04095
      [   83.959955] RBP: ffffbc34c13a3cf0 R08: 0000000000000000 R09: 0000000000000001
      [   83.968107] R10: ffffbc34c13a3cc8 R11: 00000000ffffff00 R12: ffffa047d6b23378
      [   83.976166] R13: ffffa047d6b23338 R14: ffffa047d6b240c8 R15: 0000000000000000
      [   83.984295] FS:  00007f74f6712540(0000) GS:ffffa047fbe00000(0000) knlGS:0000000000000000
      [   83.993323] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   84.000056] CR2: 0000556a1cca4e18 CR3: 000000021faa8004 CR4: 00000000003706f0
      [   84.008128] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   84.016155] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   84.024174] Call Trace:
      [   84.027514]  release_firmware.part.11+0x4b/0x70
      [   84.033017]  release_firmware+0x13/0x20
      [   84.037803]  psp_sw_fini+0x77/0xb0 [amdgpu]
      [   84.042857]  amdgpu_device_fini+0x38c/0x5d0 [amdgpu]
      [   84.048815]  amdgpu_driver_unload_kms+0x43/0x70 [amdgpu]
      [   84.055055]  drm_dev_unregister+0x73/0xb0 [drm]
      [   84.060499]  drm_dev_unplug+0x28/0x30 [drm]
      [   84.065598]  amdgpu_dev_uninit+0x1b/0x40 [amdgpu]
      [   84.071223]  amdgpu_pci_remove+0x4e/0x70 [amdgpu]
      [   84.076835]  pci_device_remove+0x3e/0xc0
      [   84.081609]  device_release_driver_internal+0xfb/0x1c0
      [   84.087558]  driver_detach+0x4d/0xa0
      [   84.092041]  bus_remove_driver+0x5f/0xe0
      [   84.096854]  driver_unregister+0x2f/0x50
      [   84.101594]  pci_unregister_driver+0x22/0xa0
      [   84.106806]  amdgpu_exit+0x15/0x2b [amdgpu]
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      eb5f4f46
  5. 02 12月, 2020 1 次提交
  6. 25 11月, 2020 1 次提交
  7. 14 11月, 2020 1 次提交
  8. 05 11月, 2020 1 次提交
  9. 04 11月, 2020 1 次提交
  10. 03 11月, 2020 2 次提交
  11. 27 10月, 2020 1 次提交
  12. 22 10月, 2020 2 次提交
  13. 17 10月, 2020 1 次提交
  14. 16 10月, 2020 1 次提交
  15. 13 10月, 2020 2 次提交
  16. 06 10月, 2020 2 次提交
  17. 26 9月, 2020 1 次提交
  18. 16 9月, 2020 4 次提交
  19. 27 8月, 2020 2 次提交
  20. 25 8月, 2020 2 次提交
  21. 15 8月, 2020 2 次提交
  22. 07 8月, 2020 2 次提交
  23. 31 7月, 2020 1 次提交
  24. 28 7月, 2020 1 次提交
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a