1. 24 8月, 2017 1 次提交
  2. 25 5月, 2017 1 次提交
    • M
      drm/amdgpu/SRIOV:implement guilty job TDR for(V2) · 65781c78
      Monk Liu 提交于
      1,TDR will kickout guilty job if it hang exceed the threshold
      of the given one from kernel paramter "job_hang_limit", that
      way a bad command stream will not infinitly cause GPU hang.
      
      by default this threshold is 1 so a job will be kicked out
      after it hang.
      
      2,if a job timeout TDR routine will not reset all sched/ring,
      instead if will only reset on the givn one which is indicated
      by @job of amdgpu_sriov_gpu_reset, that way we don't need to
      reset and recover each sched/ring if we already know which job
      cause GPU hang.
      
      3,unblock sriov_gpu_reset for AI family.
      
      V2:
      1:put kickout guilty job after sched parked.
      2:since parking scheduler prior to kickout already occupies a
      while, we can do last check on the in question job before
      doing hw_reset.
      
      TODO:
      1:when a job is considered as guilty, we should mark some flag
      in its fence status flag, and let UMD side aware that this
      fence signaling is not due to job complete but job hang.
      
      2:if gpu reset cause all video memory lost, we need introduce
      a new policy to implement TDR, like drop all jobs not yet
      signaled, and all IOCTL on this device will return ERROR
      DEVICE_LOST.
      this will be implemented later.
      Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      65781c78
  3. 11 5月, 2017 2 次提交
  4. 29 4月, 2017 1 次提交
    • C
      drm/amdgpu: fix NULL pointer error · a6bef67e
      Chunming Zhou 提交于
      [  141.420491] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [  141.420532] IP: [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
      [  141.420563] PGD 20a030067
      [  141.420575] PUD 2088ca067
      [  141.420587] PMD 0
      
      [  141.420599] Oops: 0000 [#1] SMP
      [  141.420612] Modules linked in: amdgpu(OE) ttm(OE) drm_kms_helper(E) drm(E) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) rpcsec_gss_krb5(E) nfsv4(E) nfs(E) fscache(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) snd_hda_codec_realtek(E) video(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) joydev(E) snd_hda_codec(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_hda_core(E) snd_hwdep(E) snd_rawmidi(E) snd_pcm(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) snd_seq(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_seq_device(E) snd_timer(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd(E) soundcore(E) serio_raw(E) shpchp(E) i2c_piix4(E) i2c_designware_platform(E) 8250_dw(E) i2c_designware_core(E) mac_hid(E) binfmt_misc(E)
      [  141.420948]  nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) r8169(E) ahci(E) mii(E) libahci(E) wmi(E)
      [  141.421042] CPU: 14 PID: 223 Comm: kworker/14:2 Tainted: G           OE   4.9.0-custom #4
      [  141.421074] Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 0606 04/06/2017
      [  141.421146] Workqueue: events amd_sched_job_timedout [amdgpu]
      [  141.421169] task: ffff88020b03ba80 task.stack: ffffc900016f4000
      [  141.421193] RIP: 0010:[<ffffffff81579ee1>]  [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
      [  141.421229] RSP: 0018:ffffc900016f7d30  EFLAGS: 00010202
      [  141.421250] RAX: ffff8801c049fc00 RBX: ffff8801d4d8dc00 RCX: 0000000000000000
      [  141.421278] RDX: 0000000000000001 RSI: ffff8801c049fcc0 RDI: 0000000000000000
      [  141.421307] RBP: ffffc900016f7d48 R08: 0000000000000000 R09: 0000000000000000
      [  141.421334] R10: 00000020ed512a30 R11: 0000000000000001 R12: 0000000000000000
      [  141.421362] R13: ffff880209ba4ba0 R14: ffff880209ba4c58 R15: ffff8801c055cc60
      [  141.421390] FS:  0000000000000000(0000) GS:ffff88021ef80000(0000) knlGS:0000000000000000
      [  141.421421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  141.421443] CR2: 0000000000000030 CR3: 000000020b554000 CR4: 00000000003406e0
      [  141.421471] Stack:
      [  141.421480]  ffff8801d4d8dc00 ffff880209ba4c48 ffff880209ba4ba0 ffffc900016f7d78
      [  141.421513]  ffffffffa0697920 ffff880209ba0000 0000000000000000 ffff880209ba2770
      [  141.421549]  ffff880209ba4b08 ffffc900016f7df0 ffffffffa05ce2ae ffffffffa0509eb7
      [  141.421583] Call Trace:
      [  141.421628]  [<ffffffffa0697920>] amd_sched_hw_job_reset+0x50/0xb0 [amdgpu]
      [  141.421676]  [<ffffffffa05ce2ae>] amdgpu_gpu_reset+0x8e/0x690 [amdgpu]
      [  141.421712]  [<ffffffffa0509eb7>] ? drm_printk+0x97/0xa0 [drm]
      [  141.421770]  [<ffffffffa0698156>] amdgpu_job_timedout+0x46/0x50 [amdgpu]
      [  141.421829]  [<ffffffffa0696a07>] amd_sched_job_timedout+0x17/0x20 [amdgpu]
      [  141.421859]  [<ffffffff81095493>] process_one_work+0x153/0x3f0
      [  141.421884]  [<ffffffff81095c5b>] worker_thread+0x12b/0x4b0
      [  141.421907]  [<ffffffff81095b30>] ? rescuer_thread+0x350/0x350
      [  141.421931]  [<ffffffff8109b423>] kthread+0xd3/0xf0
      [  141.421951]  [<ffffffff8109b350>] ? kthread_park+0x60/0x60
      [  141.421975]  [<ffffffff817e1ee5>] ret_from_fork+0x25/0x30
      [  141.421996] Code: ac 81 e8 a3 1f b0 ff 48 c7 c0 ea ff ff ff e9 48 ff ff ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 <48> 8b 7f 30 48 89 f3 e8 73 7c 26 00 48 8b 13 48 39 d3 41 0f 95
      [  141.422156] RIP  [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
      [  141.422183]  RSP <ffffc900016f7d30>
      [  141.422197] CR2: 0000000000000030
      [  141.433483] ---[ end trace bc0949bf7ddd6d4b ]---
      
      if the job is reset twice, then the parent could be NULL.
      Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a6bef67e
  5. 30 3月, 2017 2 次提交
  6. 02 3月, 2017 1 次提交
  7. 01 11月, 2016 1 次提交
  8. 25 10月, 2016 2 次提交
    • C
      dma-buf: Rename struct fence to dma_fence · f54d1867
      Chris Wilson 提交于
      I plan to usurp the short name of struct fence for a core kernel struct,
      and so I need to rename the specialised fence/timeline for DMA
      operations to make room.
      
      A consensus was reached in
      https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
      that making clear this fence applies to DMA operations was a good thing.
      Since then the patch has grown a bit as usage increases, so hopefully it
      remains a good thing!
      
      (v2...: rebase, rerun spatch)
      v3: Compile on msm, spotted a manual fixup that I broke.
      v4: Try again for msm, sorry Daniel
      
      coccinelle script:
      @@
      
      @@
      - struct fence
      + struct dma_fence
      @@
      
      @@
      - struct fence_ops
      + struct dma_fence_ops
      @@
      
      @@
      - struct fence_cb
      + struct dma_fence_cb
      @@
      
      @@
      - struct fence_array
      + struct dma_fence_array
      @@
      
      @@
      - enum fence_flag_bits
      + enum dma_fence_flag_bits
      @@
      
      @@
      (
      - fence_init
      + dma_fence_init
      |
      - fence_release
      + dma_fence_release
      |
      - fence_free
      + dma_fence_free
      |
      - fence_get
      + dma_fence_get
      |
      - fence_get_rcu
      + dma_fence_get_rcu
      |
      - fence_put
      + dma_fence_put
      |
      - fence_signal
      + dma_fence_signal
      |
      - fence_signal_locked
      + dma_fence_signal_locked
      |
      - fence_default_wait
      + dma_fence_default_wait
      |
      - fence_add_callback
      + dma_fence_add_callback
      |
      - fence_remove_callback
      + dma_fence_remove_callback
      |
      - fence_enable_sw_signaling
      + dma_fence_enable_sw_signaling
      |
      - fence_is_signaled_locked
      + dma_fence_is_signaled_locked
      |
      - fence_is_signaled
      + dma_fence_is_signaled
      |
      - fence_is_later
      + dma_fence_is_later
      |
      - fence_later
      + dma_fence_later
      |
      - fence_wait_timeout
      + dma_fence_wait_timeout
      |
      - fence_wait_any_timeout
      + dma_fence_wait_any_timeout
      |
      - fence_wait
      + dma_fence_wait
      |
      - fence_context_alloc
      + dma_fence_context_alloc
      |
      - fence_array_create
      + dma_fence_array_create
      |
      - to_fence_array
      + to_dma_fence_array
      |
      - fence_is_array
      + dma_fence_is_array
      |
      - trace_fence_emit
      + trace_dma_fence_emit
      |
      - FENCE_TRACE
      + DMA_FENCE_TRACE
      |
      - FENCE_WARN
      + DMA_FENCE_WARN
      |
      - FENCE_ERR
      + DMA_FENCE_ERR
      )
       (
       ...
       )
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      Acked-by: NSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
      f54d1867
    • G
      drm/amdgpu: fix sched fence slab teardown · a053fb7e
      Grazvydas Ignotas 提交于
      To free fences, call_rcu() is used, which calls amd_sched_fence_free()
      after a grace period. During teardown, there is no guarantee all
      callbacks have finished, so sched_fence_slab may be destroyed before
      all fences have been freed. If we are lucky, this results in some slab
      warnings, if not, we get a crash in one of rcu threads because callback
      is called after amdgpu has already been unloaded.
      
      Fix it with a rcu_barrier().
      
      Fixes: 189e0fb7 ("drm/amdgpu: RCU protected amd_sched_fence_release")
      Acked-by: NChunming Zhou <david1.zhou@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      a053fb7e
  9. 20 8月, 2016 1 次提交
  10. 30 7月, 2016 2 次提交
  11. 08 7月, 2016 15 次提交
  12. 05 5月, 2016 1 次提交
  13. 03 5月, 2016 6 次提交
  14. 11 2月, 2016 1 次提交
  15. 15 12月, 2015 1 次提交
  16. 05 12月, 2015 1 次提交
  17. 03 12月, 2015 1 次提交