1. 18 12月, 2021 1 次提交
    • H
      drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fence · bf67014d
      Huang Rui 提交于
      The job embedded fence donesn't initialize the flags at
      dma_fence_init(). Then we will go a wrong way in
      amdgpu_fence_get_timeline_name callback and trigger a null pointer panic
      once we enabled the trace event here. So introduce new amdgpu_fence
      object to indicate the job embedded fence.
      
      [  156.131790] BUG: kernel NULL pointer dereference, address: 00000000000002a0
      [  156.131804] #PF: supervisor read access in kernel mode
      [  156.131811] #PF: error_code(0x0000) - not-present page
      [  156.131817] PGD 0 P4D 0
      [  156.131824] Oops: 0000 [#1] PREEMPT SMP PTI
      [  156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G           OE     5.16.0-rc1-custom #1
      [  156.131842] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
      [  156.131848] RIP: 0010:strlen+0x0/0x20
      [  156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
      [  156.131872] RSP: 0018:ffff9bd0018dbcf8 EFLAGS: 00010206
      [  156.131880] RAX: 00000000000002a0 RBX: ffff8d0305ef01b0 RCX: 000000000000000b
      [  156.131888] RDX: ffff8d03772ab924 RSI: ffff8d0305ef01b0 RDI: 00000000000002a0
      [  156.131895] RBP: ffff9bd0018dbd60 R08: ffff8d03002094d0 R09: 0000000000000000
      [  156.131901] R10: 000000000000005e R11: 0000000000000065 R12: ffff8d03002094d0
      [  156.131907] R13: 000000000000001f R14: 0000000000070018 R15: 0000000000000007
      [  156.131914] FS:  0000000000000000(0000) GS:ffff8d062ed80000(0000) knlGS:0000000000000000
      [  156.131923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  156.131929] CR2: 00000000000002a0 CR3: 000000001120a005 CR4: 00000000003706e0
      [  156.131937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  156.131942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  156.131949] Call Trace:
      [  156.131953]  <TASK>
      [  156.131957]  ? trace_event_raw_event_dma_fence+0xcc/0x200
      [  156.131973]  ? ring_buffer_unlock_commit+0x23/0x130
      [  156.131982]  dma_fence_init+0x92/0xb0
      [  156.131993]  amdgpu_fence_emit+0x10d/0x2b0 [amdgpu]
      [  156.132302]  amdgpu_ib_schedule+0x2f9/0x580 [amdgpu]
      [  156.132586]  amdgpu_job_run+0xed/0x220 [amdgpu]
      
      v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot)
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      bf67014d
  2. 02 12月, 2021 2 次提交
  3. 25 11月, 2021 1 次提交
  4. 18 11月, 2021 1 次提交
  5. 10 11月, 2021 1 次提交
  6. 06 11月, 2021 1 次提交
  7. 04 11月, 2021 2 次提交
  8. 29 10月, 2021 1 次提交
  9. 20 10月, 2021 1 次提交
  10. 14 10月, 2021 1 次提交
  11. 09 10月, 2021 1 次提交
  12. 07 10月, 2021 1 次提交
  13. 06 10月, 2021 6 次提交
  14. 05 10月, 2021 7 次提交
  15. 30 9月, 2021 1 次提交
  16. 24 9月, 2021 1 次提交
  17. 16 9月, 2021 1 次提交
  18. 15 9月, 2021 2 次提交
  19. 21 8月, 2021 2 次提交
    • M
      drm/amdgpu: Cancel delayed work when GFXOFF is disabled · 32bc8f83
      Michel Dänzer 提交于
      schedule_delayed_work does not push back the work if it was already
      scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
      after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
      was disabled and re-enabled again during those 100 ms.
      
      This resulted in frame drops / stutter with the upcoming mutter 41
      release on Navi 14, due to constantly enabling GFXOFF in the HW and
      disabling it again (for getting the GPU clock counter).
      
      To fix this, call cancel_delayed_work_sync when the disable count
      transitions from 0 to 1, and only schedule the delayed work on the
      reverse transition, not if the disable count was already 0. This makes
      sure the delayed work doesn't run at unexpected times, and allows it to
      be lock-free.
      
      v2:
      * Use cancel_delayed_work_sync & mutex_trylock instead of
        mod_delayed_work.
      v3:
      * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
      v4:
      * Fix race condition between amdgpu_gfx_off_ctrl incrementing
        adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
        checking for it to be 0 (Evan Quan)
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NEvan Quan <evan.quan@amd.com>
      Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> # v3
      Acked-by: Christian König <christian.koenig@amd.com> # v3
      Signed-off-by: NMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      32bc8f83
    • M
      drm/amdgpu: Cancel delayed work when GFXOFF is disabled · 90a92662
      Michel Dänzer 提交于
      schedule_delayed_work does not push back the work if it was already
      scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms
      after the first time GFXOFF was disabled and re-enabled, even if GFXOFF
      was disabled and re-enabled again during those 100 ms.
      
      This resulted in frame drops / stutter with the upcoming mutter 41
      release on Navi 14, due to constantly enabling GFXOFF in the HW and
      disabling it again (for getting the GPU clock counter).
      
      To fix this, call cancel_delayed_work_sync when the disable count
      transitions from 0 to 1, and only schedule the delayed work on the
      reverse transition, not if the disable count was already 0. This makes
      sure the delayed work doesn't run at unexpected times, and allows it to
      be lock-free.
      
      v2:
      * Use cancel_delayed_work_sync & mutex_trylock instead of
        mod_delayed_work.
      v3:
      * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König)
      v4:
      * Fix race condition between amdgpu_gfx_off_ctrl incrementing
        adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off
        checking for it to be 0 (Evan Quan)
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NEvan Quan <evan.quan@amd.com>
      Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> # v3
      Acked-by: Christian König <christian.koenig@amd.com> # v3
      Signed-off-by: NMichel Dänzer <mdaenzer@redhat.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      90a92662
  20. 19 8月, 2021 1 次提交
  21. 17 8月, 2021 1 次提交
    • J
      drm/amd/amdgpu embed hw_fence into amdgpu_job · c530b02f
      Jack Zhang 提交于
      Why: Previously hw fence is alloced separately with job.
      It caused historical lifetime issues and corner cases.
      The ideal situation is to take fence to manage both job
      and fence's lifetime, and simplify the design of gpu-scheduler.
      
      How:
      We propose to embed hw_fence into amdgpu_job.
      1. We cover the normal job submission by this method.
      2. For ib_test, and submit without a parent job keep the
      legacy way to create a hw fence separately.
      v2:
      use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
      embedded in a job.
      v3:
      remove redundant variable ring in amdgpu_job
      v4:
      add tdr sequence support for this feature. Add a job_run_counter to
      indicate whether this job is a resubmit job.
      v5
      add missing handling in amdgpu_fence_enable_signaling
      Signed-off-by: NJingwen Chen <Jingwen.Chen2@amd.com>
      Signed-off-by: NJack Zhang <Jack.Zhang7@hotmail.com>
      Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Reviewed by: Monk Liu <monk.liu@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      c530b02f
  22. 10 8月, 2021 1 次提交
  23. 06 8月, 2021 1 次提交
  24. 29 7月, 2021 1 次提交
  25. 27 7月, 2021 1 次提交
    • J
      drm/amdgpu: Fix resource leak on probe error path · d47255d3
      Jiri Kosina 提交于
      This reverts commit 4192f7b5.
      
      It is not true (as stated in the reverted commit changelog) that we never
      unmap the BAR on failure; it actually does happen properly on
      amdgpu_driver_load_kms() -> amdgpu_driver_unload_kms() ->
      amdgpu_device_fini() error path.
      
      What's worse, this commit actually completely breaks resource freeing on
      probe failure (like e.g. failure to load microcode), as
      amdgpu_driver_unload_kms() notices adev->rmmio being NULL and bails too
      early, leaving all the resources that'd normally be freed in
      amdgpu_acpi_fini() and amdgpu_device_fini() still hanging around, leading
      to all sorts of oopses when someone tries to, for example, access the
      sysfs and procfs resources which are still around while the driver is
      gone.
      
      Fixes: 4192f7b5 ("drm/amdgpu: unmap register bar on device init failure")
      Reported-by: NVojtech Pavlik <vojtech@ucw.cz>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Cc: stable@vger.kernel.org
      d47255d3