提交 · cfb83b1d9c38c29c3c89e8d242b8e7f0148d6c09 · openeuler / raspberrypi-kernel

05 12月, 2017 8 次提交

drm/amdgpu:fix gpu recover missing skipping(v2) · cfb83b1d

由 Monk Liu 提交于 11月 08, 2017

if app close CTX right after IB submit, gpu recover
will fail to find out the entity behind this guilty
job thus lead to no job skipping for this guilty job.

to fix this corner case just move the increasement of
job->karma out of the entity iteration.

v2:
only do karma increasment if bad->s_priority != KERNEL
because we always consider KERNEL job be correct and always
want to recover an unfinished kernel job (sometimes kernel
job is interrupted by VF FLR or other GPU hang event)
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-By: NXiangliang Yu <Xiangliang.Yu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cfb83b1d

amd/scheduler:imple job skip feature(v3) · 48f05f29

由 Monk Liu 提交于 10月 25, 2017

jobs are skipped under two cases
1)when the entity behind this job marked guilty, the job
poped from this entity's queue will be dropped in sched_main loop.

2)in job_recovery(), skip the scheduling job if its karma detected
above limit, and also skipped as well for other jobs sharing the
same fence context. this approach is becuase job_recovery() cannot
access job->entity due to entity may already dead.

v2:
some logic fix

v3:
when entity detected guilty, don't drop the job in the poping
stage, instead set its fence error as -ECANCELED

in run_job(), skip the scheduling either:1) fence->error < 0
or 2) there was a VRAM LOST occurred on this job.
this way we can unify the job skipping logic.

with this feature we can introduce new gpu recover feature.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

48f05f29

drm/amdgpu: Remove job->s_entity to avoid keeping reference to stale pointer. · a4176cb4

由 Andrey Grodzovsky 提交于 10月 24, 2017

Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a4176cb4

drm/amdgpu: Fix deadlock during GPU reset. · 83f4b118

由 Andrey Grodzovsky 提交于 10月 12, 2017

Bug:
Kfifo is limited at size, during GPU reset it would fill up to limit
and the pushing thread (producer) would wait for the scheduler worker to
consume the items in the fifo while holding reservation lock
on a BO. The gpu reset thread on the other hand blocks the scheduler
during reset. Before it unblocks the sceduler it might want
to recover VRAM and so will try to reserve the same BO the producer
thread is already holding creating a deadlock.

Fix:
Switch from kfifo to SPSC queue which is unlimited in size.
Signed-off-by: NAndrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

83f4b118

drm/amdgpu:cleanup job reset routine(v2) · a8a51a70

由 Monk Liu 提交于 10月 16, 2017

merge the setting guilty on context into this function
to avoid implement extra routine.

v2:
go through entity list and compare the fence_ctx
before operate on the entity, otherwise the entity
may be just a wild pointer
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a8a51a70

drm/amd/scheduler:introduce guilty pointer member · b3eebe3d

由 Monk Liu 提交于 10月 23, 2017

this member will be used later, it will points to
the real var inside of context and CS_SUBMIT & gpu schdduler
can decide if skip a job depends on context->guilty or *entity->guilty
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b3eebe3d

drm/amdgpu:add hang_limit for sched(v2) · 95aa9b1d

由 Monk Liu 提交于 10月 17, 2017

since gpu_scheduler source domain cannot access amdgpu variable
so need create the hang_limit membewr for sched, and it can
refer it for the upcoming GPU RESET patches

v2:
make hang_limit a parameter of sched_init()
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95aa9b1d

drm/amdgpu: Avoid accessing job->entity after the job is scheduled. · d1f6dc1a

由 Andrey Grodzovsky 提交于 10月 19, 2017

Bug: amdgpu_job_free_cb was accessing s_job->s_entity when the allocated
amdgpu_ctx (and the entity inside it) were already deallocated from
amdgpu_cs_parser_fini.

Fix: Save job's priority on it's creation instead of accessing it from
s_entity later on.
Signed-off-by: NAndrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d1f6dc1a

25 10月, 2017 1 次提交

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05

由 Mark Rutland 提交于 10月 23, 2017

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6aa7de05

20 10月, 2017 1 次提交

drm/amd/sched: fix job tear down order v2 · 7fd5e36c

由 Christian König 提交于 10月 13, 2017

Move the trace before we signal the scheduler fence and drop the
scheduler fence reference directly before we free the job.

v2: keep extra s_fence reference
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NLiu, Monk <Monk.Liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7fd5e36c

19 10月, 2017 1 次提交

Revert "drm/amdgpu: discard commands of killed processes" · c9450127

由 Alex Deucher 提交于 10月 12, 2017

This causes instability in piglit.  It's fixed in drm-next with:
515c6faf
1650c14b
214a91e6
29d25355
79867462

This reverts commit 6af0883e.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c9450127

10 10月, 2017 1 次提交

drm/amd/sched: allow clients to edit an entity's rq v2 · 9ebbaabe

由 Andres Rodriguez 提交于 6月 02, 2017

This is useful for changing an entity's priority at runtime.

v2: don't modify the order of amd_sched_entity members
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9ebbaabe

07 10月, 2017 5 次提交

drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs · 79867462

由 Nicolai Hähnle 提交于 9月 28, 2017

Highly concurrent Piglit runs can trigger a race condition where a pending
SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the
buffer was stuck wherever it happened to be at the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

    [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
    [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
    [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
    [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
    [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
    [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
    [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
    [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
    [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
    [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
    [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
    [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
    [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
    [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
    [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
    [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
    [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

Note: The correctness of this change depends on the earlier commit
"drm/amd/sched: move adding finish callback to amd_sched_job_begin"

v2: set an error on the finished fence
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

79867462

drm/amd/sched: NULL out the s_fence field after run_job · 29d25355

由 Nicolai Hähnle 提交于 9月 28, 2017

amd_sched_process_job drops the fence reference, so NULL out the s_fence
field before adding it as a callback to guard against accidentally using
s_fence after it may have be freed.

v2: add a clarifying comment
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29d25355

drm/amd/sched: move adding finish callback to amd_sched_job_begin · 214a91e6

由 Nicolai Hähnle 提交于 9月 28, 2017

The finish callback is responsible for removing the job from the ring
mirror list, among other things. It makes sense to add it as callback
in the place where the job is added to the ring mirror list.
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

214a91e6

drm/amd/sched: fix an outdated comment · 1650c14b

由 Nicolai Hähnle 提交于 9月 28, 2017

Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1650c14b

drm/amd/sched: rename amd_sched_entity_pop_job · 515c6faf

由 Nicolai Hähnle 提交于 9月 28, 2017

The function does not actually remove the job from the FIFO, so "peek"
describes it better.
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

515c6faf

30 8月, 2017 1 次提交

drm/amdgpu: discard commands of killed processes · f0694d3b

由 Christian König 提交于 8月 21, 2017

When a process is killed we shouldn't submit all waiting jobs, but instead
clean up as fast as possible.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f0694d3b

24 8月, 2017 1 次提交

drm/amdgpu: discard commands of killed processes · 6af0883e

由 Christian König 提交于 8月 21, 2017

When a process is killed we shouldn't submit all waiting jobs, but instead
clean up as fast as possible.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6af0883e

25 5月, 2017 1 次提交

drm/amdgpu/SRIOV:implement guilty job TDR for(V2) · 65781c78

由 Monk Liu 提交于 5月 11, 2017

1,TDR will kickout guilty job if it hang exceed the threshold
of the given one from kernel paramter "job_hang_limit", that
way a bad command stream will not infinitly cause GPU hang.

by default this threshold is 1 so a job will be kicked out
after it hang.

2,if a job timeout TDR routine will not reset all sched/ring,
instead if will only reset on the givn one which is indicated
by @job of amdgpu_sriov_gpu_reset, that way we don't need to
reset and recover each sched/ring if we already know which job
cause GPU hang.

3,unblock sriov_gpu_reset for AI family.

V2:
1:put kickout guilty job after sched parked.
2:since parking scheduler prior to kickout already occupies a
while, we can do last check on the in question job before
doing hw_reset.

TODO:
1:when a job is considered as guilty, we should mark some flag
in its fence status flag, and let UMD side aware that this
fence signaling is not due to job complete but job hang.

2:if gpu reset cause all video memory lost, we need introduce
a new policy to implement TDR, like drop all jobs not yet
signaled, and all IOCTL on this device will return ERROR
DEVICE_LOST.
this will be implemented later.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

65781c78

11 5月, 2017 2 次提交

drm/amdgpu: fix dependency issue · 30514dec

由 Chunming Zhou 提交于 5月 09, 2017

The problem is that executing the jobs in the right order doesn't give you the right result
because consecutive jobs executed on the same engine are pipelined.
In other words job B does it buffer read before job A has written it's result.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

30514dec

drm/amd: fix init order of sched job · cb3696fd

由 Chunming Zhou 提交于 5月 09, 2017

Need to increment after the fence check.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cb3696fd

29 4月, 2017 1 次提交

drm/amdgpu: fix NULL pointer error · a6bef67e

由 Chunming Zhou 提交于 4月 24, 2017

[  141.420491] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[  141.420532] IP: [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
[  141.420563] PGD 20a030067
[  141.420575] PUD 2088ca067
[  141.420587] PMD 0

[  141.420599] Oops: 0000 [#1] SMP
[  141.420612] Modules linked in: amdgpu(OE) ttm(OE) drm_kms_helper(E) drm(E) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) rpcsec_gss_krb5(E) nfsv4(E) nfs(E) fscache(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) snd_hda_codec_realtek(E) video(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) joydev(E) snd_hda_codec(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_hda_core(E) snd_hwdep(E) snd_rawmidi(E) snd_pcm(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) snd_seq(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_seq_device(E) snd_timer(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd(E) soundcore(E) serio_raw(E) shpchp(E) i2c_piix4(E) i2c_designware_platform(E) 8250_dw(E) i2c_designware_core(E) mac_hid(E) binfmt_misc(E)
[  141.420948]  nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) r8169(E) ahci(E) mii(E) libahci(E) wmi(E)
[  141.421042] CPU: 14 PID: 223 Comm: kworker/14:2 Tainted: G           OE   4.9.0-custom #4
[  141.421074] Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 0606 04/06/2017
[  141.421146] Workqueue: events amd_sched_job_timedout [amdgpu]
[  141.421169] task: ffff88020b03ba80 task.stack: ffffc900016f4000
[  141.421193] RIP: 0010:[<ffffffff81579ee1>]  [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
[  141.421229] RSP: 0018:ffffc900016f7d30  EFLAGS: 00010202
[  141.421250] RAX: ffff8801c049fc00 RBX: ffff8801d4d8dc00 RCX: 0000000000000000
[  141.421278] RDX: 0000000000000001 RSI: ffff8801c049fcc0 RDI: 0000000000000000
[  141.421307] RBP: ffffc900016f7d48 R08: 0000000000000000 R09: 0000000000000000
[  141.421334] R10: 00000020ed512a30 R11: 0000000000000001 R12: 0000000000000000
[  141.421362] R13: ffff880209ba4ba0 R14: ffff880209ba4c58 R15: ffff8801c055cc60
[  141.421390] FS:  0000000000000000(0000) GS:ffff88021ef80000(0000) knlGS:0000000000000000
[  141.421421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  141.421443] CR2: 0000000000000030 CR3: 000000020b554000 CR4: 00000000003406e0
[  141.421471] Stack:
[  141.421480]  ffff8801d4d8dc00 ffff880209ba4c48 ffff880209ba4ba0 ffffc900016f7d78
[  141.421513]  ffffffffa0697920 ffff880209ba0000 0000000000000000 ffff880209ba2770
[  141.421549]  ffff880209ba4b08 ffffc900016f7df0 ffffffffa05ce2ae ffffffffa0509eb7
[  141.421583] Call Trace:
[  141.421628]  [<ffffffffa0697920>] amd_sched_hw_job_reset+0x50/0xb0 [amdgpu]
[  141.421676]  [<ffffffffa05ce2ae>] amdgpu_gpu_reset+0x8e/0x690 [amdgpu]
[  141.421712]  [<ffffffffa0509eb7>] ? drm_printk+0x97/0xa0 [drm]
[  141.421770]  [<ffffffffa0698156>] amdgpu_job_timedout+0x46/0x50 [amdgpu]
[  141.421829]  [<ffffffffa0696a07>] amd_sched_job_timedout+0x17/0x20 [amdgpu]
[  141.421859]  [<ffffffff81095493>] process_one_work+0x153/0x3f0
[  141.421884]  [<ffffffff81095c5b>] worker_thread+0x12b/0x4b0
[  141.421907]  [<ffffffff81095b30>] ? rescuer_thread+0x350/0x350
[  141.421931]  [<ffffffff8109b423>] kthread+0xd3/0xf0
[  141.421951]  [<ffffffff8109b350>] ? kthread_park+0x60/0x60
[  141.421975]  [<ffffffff817e1ee5>] ret_from_fork+0x25/0x30
[  141.421996] Code: ac 81 e8 a3 1f b0 ff 48 c7 c0 ea ff ff ff e9 48 ff ff ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 <48> 8b 7f 30 48 89 f3 e8 73 7c 26 00 48 8b 13 48 39 d3 41 0f 95
[  141.422156] RIP  [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
[  141.422183]  RSP <ffffc900016f7d30>
[  141.422197] CR2: 0000000000000030
[  141.433483] ---[ end trace bc0949bf7ddd6d4b ]---

if the job is reset twice, then the parent could be NULL.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a6bef67e

30 3月, 2017 2 次提交

drm/amd/sched: revise priority number · 153de9df

由 Chunming Zhou 提交于 3月 16, 2017

big number is to high priority.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

153de9df

drm/amd/sched: add a unique job id to amd_sched_job · 93f8b367

由 Andres Rodriguez 提交于 3月 09, 2017

A unique id is useful for debugging and tracing. Intended to replace
pointers in ftrace output.
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

93f8b367

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> · ae7e81c0

由 Ingo Molnar 提交于 2月 01, 2017

We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
which will be used from a number of .c files.

Create empty placeholder header that maps to <linux/types.h>.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ae7e81c0

01 11月, 2016 1 次提交

drm/amd: fix scheduler fence teardown order v2 · c24784f0

由 Christian König 提交于 10月 28, 2016

Some fences might be alive even after we have stopped the scheduler leading
to warnings about leaked objects from the SLUB allocator.

Fix this by allocating/freeing the SLUB allocator from the module
init/fini functions just like we do it for hw fences.

v2: make variable static, add link to bug

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=97500Reported-by: NGrazvydas Ignotas <notasas@gmail.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

c24784f0

25 10月, 2016 2 次提交

dma-buf: Rename struct fence to dma_fence · f54d1867

由 Chris Wilson 提交于 10月 25, 2016

I plan to usurp the short name of struct fence for a core kernel struct,
and so I need to rename the specialised fence/timeline for DMA
operations to make room.

A consensus was reached in
https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
that making clear this fence applies to DMA operations was a good thing.
Since then the patch has grown a bit as usage increases, so hopefully it
remains a good thing!

(v2...: rebase, rerun spatch)
v3: Compile on msm, spotted a manual fixup that I broke.
v4: Try again for msm, sorry Daniel

coccinelle script:
@@

@@
- struct fence
+ struct dma_fence
@@

@@
- struct fence_ops
+ struct dma_fence_ops
@@

@@
- struct fence_cb
+ struct dma_fence_cb
@@

@@
- struct fence_array
+ struct dma_fence_array
@@

@@
- enum fence_flag_bits
+ enum dma_fence_flag_bits
@@

@@
(
- fence_init
+ dma_fence_init
|
- fence_release
+ dma_fence_release
|
- fence_free
+ dma_fence_free
|
- fence_get
+ dma_fence_get
|
- fence_get_rcu
+ dma_fence_get_rcu
|
- fence_put
+ dma_fence_put
|
- fence_signal
+ dma_fence_signal
|
- fence_signal_locked
+ dma_fence_signal_locked
|
- fence_default_wait
+ dma_fence_default_wait
|
- fence_add_callback
+ dma_fence_add_callback
|
- fence_remove_callback
+ dma_fence_remove_callback
|
- fence_enable_sw_signaling
+ dma_fence_enable_sw_signaling
|
- fence_is_signaled_locked
+ dma_fence_is_signaled_locked
|
- fence_is_signaled
+ dma_fence_is_signaled
|
- fence_is_later
+ dma_fence_is_later
|
- fence_later
+ dma_fence_later
|
- fence_wait_timeout
+ dma_fence_wait_timeout
|
- fence_wait_any_timeout
+ dma_fence_wait_any_timeout
|
- fence_wait
+ dma_fence_wait
|
- fence_context_alloc
+ dma_fence_context_alloc
|
- fence_array_create
+ dma_fence_array_create
|
- to_fence_array
+ to_dma_fence_array
|
- fence_is_array
+ dma_fence_is_array
|
- trace_fence_emit
+ trace_dma_fence_emit
|
- FENCE_TRACE
+ DMA_FENCE_TRACE
|
- FENCE_WARN
+ DMA_FENCE_WARN
|
- FENCE_ERR
+ DMA_FENCE_ERR
)
 (
 ...
 )
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: NSumit Semwal <sumit.semwal@linaro.org>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk

f54d1867

drm/amdgpu: fix sched fence slab teardown · a053fb7e

由 Grazvydas Ignotas 提交于 10月 23, 2016

To free fences, call_rcu() is used, which calls amd_sched_fence_free()
after a grace period. During teardown, there is no guarantee all
callbacks have finished, so sched_fence_slab may be destroyed before
all fences have been freed. If we are lucky, this results in some slab
warnings, if not, we get a crash in one of rcu threads because callback
is called after amdgpu has already been unloaded.

Fix it with a rcu_barrier().

Fixes: 189e0fb7 ("drm/amdgpu: RCU protected amd_sched_fence_release")
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a053fb7e

20 8月, 2016 1 次提交

drm/amdgpu: fix timeout value check in amd_sched_job_recovery · bdf00137

由 Christian König 提交于 8月 16, 2016

Could be that we don't actually have a timeout set.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdf00137

30 7月, 2016 2 次提交

drm/amd: fix deadlock of job_list_lock V2 · 1c62cf91

由 Chunming Zhou 提交于 7月 25, 2016

run_job involves mutex, which could sleep.

V2: use list_for_each_entry_safe, since the job might complete
while we dropped the lock.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1c62cf91

drm/amd: reset hw count when reset job · bdc2eea4

由 Chunming Zhou 提交于 7月 22, 2016

Means the hw ring is empty after gpu reset.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdc2eea4

08 7月, 2016 8 次提交

drm/amdgpu: add amd_sched_job_recovery · ec75f573

由 Chunming Zhou 提交于 6月 29, 2016

Which is to recover hw jobs when gpu reset.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ec75f573

drm/amd: add amd_sched_hw_job_reset · e686e75d

由 Chunming Zhou 提交于 6月 30, 2016

amd_sched_hw_job_reset will remove callback from hw fence.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e686e75d

drm/amd: add parent for sched fence · 754ce0fa

由 Chunming Zhou 提交于 6月 30, 2016

Parent of sched fence is hw fence which is to signal sched fence.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

754ce0fa

drm/amdgpu: remove fence parameter from amd_sched_job_init · 595a9cd6

由 Christian König 提交于 6月 30, 2016

We return the fence as part of the job structur anyway,
no need to do this twice.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

595a9cd6

drm/amdgpu: stop disabling irqs when it isn't neccessary · 1059e117

由 Christian König 提交于 6月 13, 2016

A regular spin_lock/unlock should do here as well.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1059e117

drm/amdgpu: block scheduler when gpu reset · 0875dc9e

由 Chunming Zhou 提交于 6月 12, 2016

Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0875dc9e

drm/amdgpu: stop trying to schedule() with a spin held · a8bd3e1c

由 Christian König 提交于 6月 13, 2016

Drop the lock before calling cancel_delayed_work_sync().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96445Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Tested-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a8bd3e1c

drm/amdgpu: generalize the scheduler fence · 6fc13675

由 Christian König 提交于 5月 20, 2016

Make it two events, one for the job being scheduled and one when it is finished.
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6fc13675