1. 25 8月, 2020 1 次提交
  2. 19 8月, 2020 2 次提交
  3. 15 8月, 2020 3 次提交
  4. 07 8月, 2020 1 次提交
  5. 31 7月, 2020 2 次提交
  6. 28 7月, 2020 2 次提交
    • E
      drm/amd/powerplay: revise the outputs layout of amdgpu_pm_info debugfs · 81b41ff5
      Evan Quan 提交于
      The current outputs of amdgpu_pm_info debugfs come with clock gating
      status and followed by current clock/power information. However the
      clock gating status retrieving may pull GFX out of CG status. That
      will make the succeeding clock/power information retrieving inaccurate.
      
      To overcome this and be with minimum impact, the outputs are updated
      to show current clock/power information first.
      Signed-off-by: NEvan Quan <evan.quan@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      81b41ff5
    • D
      drm/amdgpu: fix system hang issue during GPU reset · df9c8d1a
      Dennis Li 提交于
      when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover,
      the atomic adev->in_gpu_reset and hive->in_reset are used to avoid
      re-entering GPU recovery.
      
      During GPU reset and resume, it is unsafe that other threads access GPU,
      which maybe cause GPU reset failed. Therefore the new rw_semaphore
      adev->reset_sem is introduced, which protect GPU from being accessed by
      external threads during recovery.
      
      v2:
      1. add rwlock for some ioctls, debugfs and file-close function.
      2. change to use dqm->is_resetting and dqm_lock for protection in kfd
      driver.
      3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid
      re-enter GPU recovery for the same GPU hang.
      
      v3:
      1. change back to use adev->reset_sem to protect kfd callback
      functions, because dqm_lock couldn't protect all codes, for example:
      free_mqd must be called outside of dqm_lock;
      
      [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019
      [ 1230.177221] Call Trace:
      [ 1230.178249]  dump_stack+0x98/0xd5
      [ 1230.179443]  amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu]
      [ 1230.180673]  gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu]
      [ 1230.181882]  amdgpu_gart_unbind+0xa9/0xe0 [amdgpu]
      [ 1230.183098]  amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu]
      [ 1230.184239]  ? ttm_bo_put+0x171/0x5f0 [ttm]
      [ 1230.185394]  ttm_tt_unbind+0x21/0x40 [ttm]
      [ 1230.186558]  ttm_tt_destroy.part.12+0x12/0x60 [ttm]
      [ 1230.187707]  ttm_tt_destroy+0x13/0x20 [ttm]
      [ 1230.188832]  ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm]
      [ 1230.189979]  ttm_bo_put+0x1be/0x5f0 [ttm]
      [ 1230.191230]  amdgpu_bo_unref+0x1e/0x30 [amdgpu]
      [ 1230.192522]  amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu]
      [ 1230.193833]  free_mqd+0x25/0x40 [amdgpu]
      [ 1230.195143]  destroy_queue_cpsch+0x1a7/0x270 [amdgpu]
      [ 1230.196475]  pqm_destroy_queue+0x105/0x260 [amdgpu]
      [ 1230.197819]  kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu]
      [ 1230.199154]  kfd_ioctl+0x277/0x500 [amdgpu]
      [ 1230.200458]  ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu]
      [ 1230.201656]  ? tomoyo_file_ioctl+0x19/0x20
      [ 1230.202831]  ksys_ioctl+0x98/0xb0
      [ 1230.204004]  __x64_sys_ioctl+0x1a/0x20
      [ 1230.205174]  do_syscall_64+0x5f/0x250
      [ 1230.206339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      2. remove try_lock and introduce atomic hive->in_reset, to avoid
      re-enter GPU recovery.
      
      v4:
      1. remove an unnecessary whitespace change in kfd_chardev.c
      2. remove comment codes in amdgpu_device.c
      3. add more detailed comment in commit message
      4. define a wrap function amdgpu_in_reset
      
      v5:
      1. Fix some style issues.
      Reviewed-by: NHawking Zhang <Hawking.Zhang@amd.com>
      Suggested-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Suggested-by: NChristian König <christian.koenig@amd.com>
      Suggested-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Suggested-by: NLijo Lazar <Lijo.Lazar@amd.com>
      Suggested-by: NLuben Tukov <luben.tuikov@amd.com>
      Signed-off-by: NDennis Li <Dennis.Li@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      df9c8d1a
  7. 23 7月, 2020 1 次提交
    • A
      drm/amdgpu/powerplay: add some documentation about memory clock · ccda42a4
      Alex Deucher 提交于
      We expose the actual memory controller clock rate in Linux,
      not the effective memory clock of the DRAMs.  To translate
      it, it follows the following formula:
      
      Clock conversion (Mhz):
      HBM: effective_memory_clock = memory_controller_clock * 1
      G5:  effective_memory_clock = memory_controller_clock * 1
      G6:  effective_memory_clock = memory_controller_clock * 2
      
      DRAM data rate (MT/s):
      HBM: effective_memory_clock * 2 = data_rate
      G5:  effective_memory_clock * 4 = data_rate
      G6:  effective_memory_clock * 8 = data_rate
      
      Bandwidth (MB/s):
      data_rate * vram_bit_width / 8 = memory_bandwidth
      
      Some examples:
      G5 on RX460:
      memory_controller_clock = 1750 Mhz
      effective_memory_clock = 1750 Mhz * 1 = 1750 Mhz
      data rate = 1750 * 4 = 7000 MT/s
      memory_bandwidth = 7000 * 128 bits / 8 = 112000 MB/s
      
      G6 on RX5600:
      memory_controller_clock = 900 Mhz
      effective_memory_clock = 900 Mhz * 2 = 1800 Mhz
      data rate = 1800 * 8 = 14400 MT/s
      memory_bandwidth = 14400 * 192 bits / 8 = 345600 MB/s
      Acked-by: NEvan Quan <evan.quan@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      ccda42a4
  8. 22 7月, 2020 1 次提交
  9. 16 7月, 2020 1 次提交
  10. 08 7月, 2020 1 次提交
  11. 03 7月, 2020 2 次提交
  12. 01 7月, 2020 6 次提交
  13. 18 6月, 2020 2 次提交
  14. 03 6月, 2020 1 次提交
  15. 30 5月, 2020 3 次提交
  16. 27 5月, 2020 1 次提交
    • K
      drm/amdgpu: fix device attribute node create failed with multi gpu · ba02fd6b
      Kevin Wang 提交于
      the origin design will use varible of "attr->states" to save node
      supported states on current gpu device, but for multi gpu device, when
      probe second gpu device, the driver will check attribute node states
      from previous gpu device wthether to create attribute node.
      it will cause other gpu device create attribute node faild.
      
      1. add member attr_list into amdgpu_device to link supported device attribute node.
      2. add new structure "struct amdgpu_device_attr_entry{}" to track device attribute state.
      3. drop member "states" from amdgpu_device_attr.
      
      v2:
      1. move "attr_list" into amdgpu_pm and rename to "pm_attr_list".
      2. refine create & remove device node functions parameter.
      
      fix:
      drm/amdgpu: optimize amdgpu device attribute code
      Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      ba02fd6b
  17. 23 5月, 2020 3 次提交
    • A
      drm/amdgpu: add apu flags (v2) · 54f78a76
      Alex Deucher 提交于
      Add some APU flags to simplify handling of different APU
      variants.  It's easier to understand the special cases
      if we use names flags rather than checking device ids and
      silicon revisions.
      
      v2: rebase on latest code
      Acked-by: NEvan Quan <evan.quan@amd.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      54f78a76
    • C
      drm/amd/powerpay: Disable gfxoff when setting manual mode on picasso and raven · cbd2d08c
      chen gong 提交于
      [Problem description]
      1. Boot up picasso platform, launches desktop, Don't do anything (APU enter into "gfxoff" state)
      2. Remote login to platform using SSH, then type the command line:
      	sudo su -c "echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level"
      	sudo su -c "echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk" (fix SCLK to 1400MHz)
      3. Move the mouse around in Window
      4. Phenomenon :  The screen frozen
      
      Tester will switch sclk level during glmark2 run time.
      APU will enter "gfxoff" state intermittently during glmark2 run time.
      The system got hanged if fix GFXCLK to 1400MHz when APU is in "gfxoff"
      state.
      
      [Debug]
      1. Fix SCLK to X MHz
      	1400: screen frozen, screen black, then OS will reboot.
      	1300: screen frozen.
      	1200: screen frozen, screen black.
      	1100: screen frozen, screen black, then OS will reboot.
      	1000: screen frozen, screen black.
      	900:  screen frozen, screen black, then OS will reboot.
      	800:  Situation Nomal, issue disappear.
      	700:  Situation Nomal, issue disappear.
      2. SBIOS setting: AMD CBS --> SMU Debug Options -->SMU Debug --> "GFX DLDO Psm Margin Control":
      	50 : Situation Nomal, issue disappear.
      	45 : Situation Nomal, issue disappear.
      	40 : Situation Nomal, issue disappear.
      	35 : Situation Nomal, issue disappear.
      	30 : screen black.
      	25 : screen frozen, then blurred screen.
      	20 : screen frozen.
      	15 : screen black.
      	10 : screen frozen.
      	5  : screen frozen, then blurred screen.
      3. Disable GFXOFF feature
      	Situation Nomal, issue disappear.
      
      [Why]
      Through a period of time debugging with Sys Eng team and SMU team, Sys
      Eng team said this is voltage/frequency marginal issue not a F/W or H/W
      bug. This experiment proves that default targetPsm [for f=1400MHz] is
      not sufficient when GFXOFF is enabled on Picasso.
      
      SMU team think it is an odd test conditions to force sclk="1400MHz" when
      GPU is in "gfxoff" state,then wake up the GFX. SCLK should be in the
      "lowest frequency" when gfxoff.
      
      [How]
      Disable gfxoff when setting manual mode.
      Enable gfxoff when setting other mode(exiting manual mode) again.
      
      By the way, from the user point of view, now that user switch to manual
      mode and force SCLK Frequency, he don't want SCLK be controlled by
      workload.It becomes meaningless to "switch to manual mode" if APU enter "gfxoff"
      due to lack of workload at this point.
      
      Tips: Same issue observed on Raven.
      Signed-off-by: Nchen gong <curry.gong@amd.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      cbd2d08c
    • A
      drm/amdgpu: fix pm sysfs node handling (v2) · d5c8ffb9
      Alex Deucher 提交于
      Fix typos that prevented them from showing up.
      
      v2: switch other files in addition to pp_clk_voltage
      
      Fixes: 4e01847c ("drm/amdgpu: optimize amdgpu device attribute code")
      Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1150Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Acked-by: NEvan Quan <evan.quan@amd.com>
      d5c8ffb9
  18. 22 5月, 2020 3 次提交
  19. 18 5月, 2020 1 次提交
  20. 24 4月, 2020 2 次提交
  21. 09 4月, 2020 1 次提交