提交 · 95aa9b1d9738faa80c66df41d59358d5ff4c288a · openeuler / raspberrypi-kernel

05 12月, 2017 2 次提交

drm/amdgpu:add hang_limit for sched(v2) · 95aa9b1d

由 Monk Liu 提交于 10月 17, 2017

since gpu_scheduler source domain cannot access amdgpu variable
so need create the hang_limit membewr for sched, and it can
refer it for the upcoming GPU RESET patches

v2:
make hang_limit a parameter of sched_init()
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95aa9b1d

drm/amdgpu: Avoid accessing job->entity after the job is scheduled. · d1f6dc1a

由 Andrey Grodzovsky 提交于 10月 19, 2017

Bug: amdgpu_job_free_cb was accessing s_job->s_entity when the allocated
amdgpu_ctx (and the entity inside it) were already deallocated from
amdgpu_cs_parser_fini.

Fix: Save job's priority on it's creation instead of accessing it from
s_entity later on.
Signed-off-by: NAndrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

d1f6dc1a

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

25 10月, 2017 1 次提交

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05

由 Mark Rutland 提交于 10月 23, 2017

locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6aa7de05

20 10月, 2017 1 次提交

drm/amd/sched: fix job tear down order v2 · 7fd5e36c

由 Christian König 提交于 10月 13, 2017

Move the trace before we signal the scheduler fence and drop the
scheduler fence reference directly before we free the job.

v2: keep extra s_fence reference
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NLiu, Monk <Monk.Liu@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

7fd5e36c

19 10月, 2017 1 次提交

Revert "drm/amdgpu: discard commands of killed processes" · c9450127

由 Alex Deucher 提交于 10月 12, 2017

This causes instability in piglit.  It's fixed in drm-next with:
515c6faf
1650c14b
214a91e6
29d25355
79867462

This reverts commit 6af0883e.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c9450127

10 10月, 2017 5 次提交

drm/amdgpu: introduce AMDGPU_CTX_PRIORITY_UNSET · f3d19bf8

由 Andres Rodriguez 提交于 6月 26, 2017

Use _INVALID to identify bad parameters and _UNSET to represent the
lack of interest in a specific value.
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f3d19bf8

drm/amd/sched: allow clients to edit an entity's rq v2 · 9ebbaabe

由 Andres Rodriguez 提交于 6月 02, 2017

This is useful for changing an entity's priority at runtime.

v2: don't modify the order of amd_sched_entity members
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

9ebbaabe

drm/amdgpu: make amdgpu_to_sched_priority detect invalid parameters · b6d8a439

由 Andres Rodriguez 提交于 5月 24, 2017

Returning invalid priorities as _NORMAL is a backwards compatibility
quirk of amdgpu_ctx_ioctl(). Move this detail one layer up where it
belongs.
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b6d8a439

drm/amdgpu: add framework for HW specific priority settings v9 · b2ff0e8a

由 Andres Rodriguez 提交于 2月 20, 2017

Add an initial framework for changing the HW priorities of rings. The
framework allows requesting priority changes for the lifetime of an
amdgpu_job. After the job completes the priority will decay to the next
lowest priority for which a request is still valid.

A new ring function set_priority() can now be populated to take care of
the HW specific programming sequence for priority changes.

v2: set priority before emitting IB, and take a ref on amdgpu_job
v3: use AMD_SCHED_PRIORITY_* instead of AMDGPU_CTX_PRIORITY_*
v4: plug amdgpu_ring_restore_priority_cb into amdgpu_job_free_cb
v5: use atomic for tracking job priorities instead of last_job
v6: rename amdgpu_ring_priority_[get/put]() and align parameters
v7: replace spinlocks with mutexes for KIQ compatibility
v8: raise ring priority during cs_ioctl, instead of job_run
v9: priority_get() before push_job()
Reviewed-by: NChristian König <christian.koenig@amd.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

b2ff0e8a

drm/amdgpu: add parameter to allocate high priority contexts v11 · c2636dc5

由 Andres Rodriguez 提交于 12月 22, 2016

Add a new context creation parameter to express a global context priority.

The priority ranking in descending order is as follows:
 * AMDGPU_CTX_PRIORITY_HIGH_HW
 * AMDGPU_CTX_PRIORITY_HIGH_SW
 * AMDGPU_CTX_PRIORITY_NORMAL
 * AMDGPU_CTX_PRIORITY_LOW_SW
 * AMDGPU_CTX_PRIORITY_LOW_HW

The driver will attempt to schedule work to the hardware according to
the priorities. No latency or throughput guarantees are provided by
this patch.

This interface intends to service the EGL_IMG_context_priority
extension, and vulkan equivalents.

Setting a priority above NORMAL requires CAP_SYS_NICE or DRM_MASTER.

v2: Instead of using flags, repurpose __pad
v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
v4: Validate usermode priority and store it
v5: Move priority validation into amdgpu_ctx_ioctl(), headline reword
v6: add UAPI note regarding priorities requiring CAP_SYS_ADMIN
v7: remove ctx->priority
v8: added AMDGPU_CTX_PRIORITY_LOW, s/CAP_SYS_ADMIN/CAP_SYS_NICE
v9: change the priority parameter to __s32
v10: split priorities into _SW and _HW
v11: Allow DRM_MASTER without CAP_SYS_NICE
Reviewed-by: NEmil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

c2636dc5

07 10月, 2017 5 次提交

drm/amd/sched: fix deadlock caused by unsignaled fences of deleted jobs · 79867462

由 Nicolai Hähnle 提交于 9月 28, 2017

Highly concurrent Piglit runs can trigger a race condition where a pending
SDMA job on a buffer object is never executed because the corresponding
process is killed (perhaps due to a crash). Since the job's fences were
never signaled, the buffer object was effectively leaked. Worse, the
buffer was stuck wherever it happened to be at the time, possibly in VRAM.

The symptom was user space processes stuck in interruptible waits with
kernel stacks like:

    [<ffffffffbc5e6722>] dma_fence_default_wait+0x112/0x250
    [<ffffffffbc5e6399>] dma_fence_wait_timeout+0x39/0xf0
    [<ffffffffbc5e82d2>] reservation_object_wait_timeout_rcu+0x1c2/0x300
    [<ffffffffc03ce56f>] ttm_bo_cleanup_refs_and_unlock+0xff/0x1a0 [ttm]
    [<ffffffffc03cf1ea>] ttm_mem_evict_first+0xba/0x1a0 [ttm]
    [<ffffffffc03cf611>] ttm_bo_mem_space+0x341/0x4c0 [ttm]
    [<ffffffffc03cfc54>] ttm_bo_validate+0xd4/0x150 [ttm]
    [<ffffffffc03cffbd>] ttm_bo_init_reserved+0x2ed/0x420 [ttm]
    [<ffffffffc042f523>] amdgpu_bo_create_restricted+0x1f3/0x470 [amdgpu]
    [<ffffffffc042f9fa>] amdgpu_bo_create+0xda/0x220 [amdgpu]
    [<ffffffffc04349ea>] amdgpu_gem_object_create+0xaa/0x140 [amdgpu]
    [<ffffffffc0434f97>] amdgpu_gem_create_ioctl+0x97/0x120 [amdgpu]
    [<ffffffffc037ddba>] drm_ioctl+0x1fa/0x480 [drm]
    [<ffffffffc041904f>] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
    [<ffffffffbc23db33>] do_vfs_ioctl+0xa3/0x5f0
    [<ffffffffbc23e0f9>] SyS_ioctl+0x79/0x90
    [<ffffffffbc864ffb>] entry_SYSCALL_64_fastpath+0x1e/0xad
    [<ffffffffffffffff>] 0xffffffffffffffff

Note: The correctness of this change depends on the earlier commit
"drm/amd/sched: move adding finish callback to amd_sched_job_begin"

v2: set an error on the finished fence
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

79867462

drm/amd/sched: NULL out the s_fence field after run_job · 29d25355

由 Nicolai Hähnle 提交于 9月 28, 2017

amd_sched_process_job drops the fence reference, so NULL out the s_fence
field before adding it as a callback to guard against accidentally using
s_fence after it may have be freed.

v2: add a clarifying comment
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

29d25355

drm/amd/sched: move adding finish callback to amd_sched_job_begin · 214a91e6

由 Nicolai Hähnle 提交于 9月 28, 2017

The finish callback is responsible for removing the job from the ring
mirror list, among other things. It makes sense to add it as callback
in the place where the job is added to the ring mirror list.
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

214a91e6

drm/amd/sched: fix an outdated comment · 1650c14b

由 Nicolai Hähnle 提交于 9月 28, 2017

Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1650c14b

drm/amd/sched: rename amd_sched_entity_pop_job · 515c6faf

由 Nicolai Hähnle 提交于 9月 28, 2017

The function does not actually remove the job from the FIFO, so "peek"
describes it better.
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

515c6faf

30 8月, 2017 1 次提交

drm/amdgpu: discard commands of killed processes · f0694d3b

由 Christian König 提交于 8月 21, 2017

When a process is killed we shouldn't submit all waiting jobs, but instead
clean up as fast as possible.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

f0694d3b

24 8月, 2017 1 次提交

drm/amdgpu: discard commands of killed processes · 6af0883e

由 Christian König 提交于 8月 21, 2017

When a process is killed we shouldn't submit all waiting jobs, but instead
clean up as fast as possible.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

6af0883e

14 7月, 2017 1 次提交

drm/amd/sched: print sched job id in amd_sched_job trace · eabd76ce

由 Nicolai Hähnle 提交于 6月 13, 2017

This makes it easier to correlate amd_sched_job with with other trace
points that don't log the job pointer.

v2: don't print the sched_job pointer (Andres)
Signed-off-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: NAndres Rodriguez <andresx7@gmail.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>

eabd76ce

25 5月, 2017 1 次提交

drm/amdgpu/SRIOV:implement guilty job TDR for(V2) · 65781c78

由 Monk Liu 提交于 5月 11, 2017

1,TDR will kickout guilty job if it hang exceed the threshold
of the given one from kernel paramter "job_hang_limit", that
way a bad command stream will not infinitly cause GPU hang.

by default this threshold is 1 so a job will be kicked out
after it hang.

2,if a job timeout TDR routine will not reset all sched/ring,
instead if will only reset on the givn one which is indicated
by @job of amdgpu_sriov_gpu_reset, that way we don't need to
reset and recover each sched/ring if we already know which job
cause GPU hang.

3,unblock sriov_gpu_reset for AI family.

V2:
1:put kickout guilty job after sched parked.
2:since parking scheduler prior to kickout already occupies a
while, we can do last check on the in question job before
doing hw_reset.

TODO:
1:when a job is considered as guilty, we should mark some flag
in its fence status flag, and let UMD side aware that this
fence signaling is not due to job complete but job hang.

2:if gpu reset cause all video memory lost, we need introduce
a new policy to implement TDR, like drop all jobs not yet
signaled, and all IOCTL on this device will return ERROR
DEVICE_LOST.
this will be implemented later.
Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

65781c78

11 5月, 2017 2 次提交

drm/amdgpu: fix dependency issue · 30514dec

由 Chunming Zhou 提交于 5月 09, 2017

The problem is that executing the jobs in the right order doesn't give you the right result
because consecutive jobs executed on the same engine are pipelined.
In other words job B does it buffer read before job A has written it's result.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

30514dec

drm/amd: fix init order of sched job · cb3696fd

由 Chunming Zhou 提交于 5月 09, 2017

Need to increment after the fence check.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NJunwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

cb3696fd

29 4月, 2017 1 次提交

drm/amdgpu: fix NULL pointer error · a6bef67e

由 Chunming Zhou 提交于 4月 24, 2017

[  141.420491] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[  141.420532] IP: [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
[  141.420563] PGD 20a030067
[  141.420575] PUD 2088ca067
[  141.420587] PMD 0

[  141.420599] Oops: 0000 [#1] SMP
[  141.420612] Modules linked in: amdgpu(OE) ttm(OE) drm_kms_helper(E) drm(E) i2c_algo_bit(E) fb_sys_fops(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) rpcsec_gss_krb5(E) nfsv4(E) nfs(E) fscache(E) eeepc_wmi(E) asus_wmi(E) sparse_keymap(E) snd_hda_codec_realtek(E) video(E) snd_hda_codec_generic(E) snd_hda_codec_hdmi(E) snd_hda_intel(E) joydev(E) snd_hda_codec(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_hda_core(E) snd_hwdep(E) snd_rawmidi(E) snd_pcm(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) snd_seq(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_seq_device(E) snd_timer(E) aesni_intel(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) snd(E) soundcore(E) serio_raw(E) shpchp(E) i2c_piix4(E) i2c_designware_platform(E) 8250_dw(E) i2c_designware_core(E) mac_hid(E) binfmt_misc(E)
[  141.420948]  nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) parport_pc(E) ppdev(E) lp(E) parport(E) autofs4(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) r8169(E) ahci(E) mii(E) libahci(E) wmi(E)
[  141.421042] CPU: 14 PID: 223 Comm: kworker/14:2 Tainted: G           OE   4.9.0-custom #4
[  141.421074] Hardware name: System manufacturer System Product Name/PRIME B350-PLUS, BIOS 0606 04/06/2017
[  141.421146] Workqueue: events amd_sched_job_timedout [amdgpu]
[  141.421169] task: ffff88020b03ba80 task.stack: ffffc900016f4000
[  141.421193] RIP: 0010:[<ffffffff81579ee1>]  [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
[  141.421229] RSP: 0018:ffffc900016f7d30  EFLAGS: 00010202
[  141.421250] RAX: ffff8801c049fc00 RBX: ffff8801d4d8dc00 RCX: 0000000000000000
[  141.421278] RDX: 0000000000000001 RSI: ffff8801c049fcc0 RDI: 0000000000000000
[  141.421307] RBP: ffffc900016f7d48 R08: 0000000000000000 R09: 0000000000000000
[  141.421334] R10: 00000020ed512a30 R11: 0000000000000001 R12: 0000000000000000
[  141.421362] R13: ffff880209ba4ba0 R14: ffff880209ba4c58 R15: ffff8801c055cc60
[  141.421390] FS:  0000000000000000(0000) GS:ffff88021ef80000(0000) knlGS:0000000000000000
[  141.421421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  141.421443] CR2: 0000000000000030 CR3: 000000020b554000 CR4: 00000000003406e0
[  141.421471] Stack:
[  141.421480]  ffff8801d4d8dc00 ffff880209ba4c48 ffff880209ba4ba0 ffffc900016f7d78
[  141.421513]  ffffffffa0697920 ffff880209ba0000 0000000000000000 ffff880209ba2770
[  141.421549]  ffff880209ba4b08 ffffc900016f7df0 ffffffffa05ce2ae ffffffffa0509eb7
[  141.421583] Call Trace:
[  141.421628]  [<ffffffffa0697920>] amd_sched_hw_job_reset+0x50/0xb0 [amdgpu]
[  141.421676]  [<ffffffffa05ce2ae>] amdgpu_gpu_reset+0x8e/0x690 [amdgpu]
[  141.421712]  [<ffffffffa0509eb7>] ? drm_printk+0x97/0xa0 [drm]
[  141.421770]  [<ffffffffa0698156>] amdgpu_job_timedout+0x46/0x50 [amdgpu]
[  141.421829]  [<ffffffffa0696a07>] amd_sched_job_timedout+0x17/0x20 [amdgpu]
[  141.421859]  [<ffffffff81095493>] process_one_work+0x153/0x3f0
[  141.421884]  [<ffffffff81095c5b>] worker_thread+0x12b/0x4b0
[  141.421907]  [<ffffffff81095b30>] ? rescuer_thread+0x350/0x350
[  141.421931]  [<ffffffff8109b423>] kthread+0xd3/0xf0
[  141.421951]  [<ffffffff8109b350>] ? kthread_park+0x60/0x60
[  141.421975]  [<ffffffff817e1ee5>] ret_from_fork+0x25/0x30
[  141.421996] Code: ac 81 e8 a3 1f b0 ff 48 c7 c0 ea ff ff ff e9 48 ff ff ff 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 53 <48> 8b 7f 30 48 89 f3 e8 73 7c 26 00 48 8b 13 48 39 d3 41 0f 95
[  141.422156] RIP  [<ffffffff81579ee1>] fence_remove_callback+0x11/0x60
[  141.422183]  RSP <ffffc900016f7d30>
[  141.422197] CR2: 0000000000000030
[  141.433483] ---[ end trace bc0949bf7ddd6d4b ]---

if the job is reset twice, then the parent could be NULL.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a6bef67e

30 3月, 2017 2 次提交

drm/amd/sched: revise priority number · 153de9df

由 Chunming Zhou 提交于 3月 16, 2017

big number is to high priority.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

153de9df

drm/amd/sched: add a unique job id to amd_sched_job · 93f8b367

由 Andres Rodriguez 提交于 3月 09, 2017

A unique id is useful for debugging and tracing. Intended to replace
pointers in ftrace output.
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAndres Rodriguez <andresx7@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

93f8b367

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to <uapi/linux/sched/types.h> · ae7e81c0

由 Ingo Molnar 提交于 2月 01, 2017

We are going to move scheduler ABI details to <uapi/linux/sched/types.h>,
which will be used from a number of .c files.

Create empty placeholder header that maps to <linux/types.h>.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ae7e81c0

01 11月, 2016 1 次提交

drm/amd: fix scheduler fence teardown order v2 · c24784f0

由 Christian König 提交于 10月 28, 2016

Some fences might be alive even after we have stopped the scheduler leading
to warnings about leaked objects from the SLUB allocator.

Fix this by allocating/freeing the SLUB allocator from the module
init/fini functions just like we do it for hw fences.

v2: make variable static, add link to bug

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=97500Reported-by: NGrazvydas Ignotas <notasas@gmail.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

c24784f0

25 10月, 2016 3 次提交

dma-buf: Rename struct fence to dma_fence · f54d1867

由 Chris Wilson 提交于 10月 25, 2016

I plan to usurp the short name of struct fence for a core kernel struct,
and so I need to rename the specialised fence/timeline for DMA
operations to make room.

A consensus was reached in
https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
that making clear this fence applies to DMA operations was a good thing.
Since then the patch has grown a bit as usage increases, so hopefully it
remains a good thing!

(v2...: rebase, rerun spatch)
v3: Compile on msm, spotted a manual fixup that I broke.
v4: Try again for msm, sorry Daniel

coccinelle script:
@@

@@
- struct fence
+ struct dma_fence
@@

@@
- struct fence_ops
+ struct dma_fence_ops
@@

@@
- struct fence_cb
+ struct dma_fence_cb
@@

@@
- struct fence_array
+ struct dma_fence_array
@@

@@
- enum fence_flag_bits
+ enum dma_fence_flag_bits
@@

@@
(
- fence_init
+ dma_fence_init
|
- fence_release
+ dma_fence_release
|
- fence_free
+ dma_fence_free
|
- fence_get
+ dma_fence_get
|
- fence_get_rcu
+ dma_fence_get_rcu
|
- fence_put
+ dma_fence_put
|
- fence_signal
+ dma_fence_signal
|
- fence_signal_locked
+ dma_fence_signal_locked
|
- fence_default_wait
+ dma_fence_default_wait
|
- fence_add_callback
+ dma_fence_add_callback
|
- fence_remove_callback
+ dma_fence_remove_callback
|
- fence_enable_sw_signaling
+ dma_fence_enable_sw_signaling
|
- fence_is_signaled_locked
+ dma_fence_is_signaled_locked
|
- fence_is_signaled
+ dma_fence_is_signaled
|
- fence_is_later
+ dma_fence_is_later
|
- fence_later
+ dma_fence_later
|
- fence_wait_timeout
+ dma_fence_wait_timeout
|
- fence_wait_any_timeout
+ dma_fence_wait_any_timeout
|
- fence_wait
+ dma_fence_wait
|
- fence_context_alloc
+ dma_fence_context_alloc
|
- fence_array_create
+ dma_fence_array_create
|
- to_fence_array
+ to_dma_fence_array
|
- fence_is_array
+ dma_fence_is_array
|
- trace_fence_emit
+ trace_dma_fence_emit
|
- FENCE_TRACE
+ DMA_FENCE_TRACE
|
- FENCE_WARN
+ DMA_FENCE_WARN
|
- FENCE_ERR
+ DMA_FENCE_ERR
)
 (
 ...
 )
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: NSumit Semwal <sumit.semwal@linaro.org>
Acked-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk

f54d1867

drm/amdgpu: update kernel-doc for some functions · 95662133

由 Grazvydas Ignotas 提交于 10月 23, 2016

The names were wrong.
Reviewed-by: NEdward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

95662133

drm/amdgpu: fix sched fence slab teardown · a053fb7e

由 Grazvydas Ignotas 提交于 10月 23, 2016

To free fences, call_rcu() is used, which calls amd_sched_fence_free()
after a grace period. During teardown, there is no guarantee all
callbacks have finished, so sched_fence_slab may be destroyed before
all fences have been freed. If we are lucky, this results in some slab
warnings, if not, we get a crash in one of rcu threads because callback
is called after amdgpu has already been unloaded.

Fix it with a rcu_barrier().

Fixes: 189e0fb7 ("drm/amdgpu: RCU protected amd_sched_fence_release")
Acked-by: NChunming Zhou <david1.zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a053fb7e

20 8月, 2016 1 次提交

drm/amdgpu: fix timeout value check in amd_sched_job_recovery · bdf00137

由 Christian König 提交于 8月 16, 2016

Could be that we don't actually have a timeout set.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdf00137

30 7月, 2016 2 次提交

drm/amd: fix deadlock of job_list_lock V2 · 1c62cf91

由 Chunming Zhou 提交于 7月 25, 2016

run_job involves mutex, which could sleep.

V2: use list_for_each_entry_safe, since the job might complete
while we dropped the lock.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1c62cf91

drm/amd: reset hw count when reset job · bdc2eea4

由 Chunming Zhou 提交于 7月 22, 2016

Means the hw ring is empty after gpu reset.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

bdc2eea4

08 7月, 2016 7 次提交

drm/amdgpu: add amd_sched_job_recovery · ec75f573

由 Chunming Zhou 提交于 6月 29, 2016

Which is to recover hw jobs when gpu reset.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

ec75f573

drm/amd: add amd_sched_hw_job_reset · e686e75d

由 Chunming Zhou 提交于 6月 30, 2016

amd_sched_hw_job_reset will remove callback from hw fence.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

e686e75d

drm/amd: add parent for sched fence · 754ce0fa

由 Chunming Zhou 提交于 6月 30, 2016

Parent of sched fence is hw fence which is to signal sched fence.
Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

754ce0fa

drm/amdgpu: remove fence parameter from amd_sched_job_init · 595a9cd6

由 Christian König 提交于 6月 30, 2016

We return the fence as part of the job structur anyway,
no need to do this twice.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NChunming Zhou <david1.zhou@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

595a9cd6

drm/amdgpu: stop disabling irqs when it isn't neccessary · 1059e117

由 Christian König 提交于 6月 13, 2016

A regular spin_lock/unlock should do here as well.
Signed-off-by: NChristian König <christian.koenig@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

1059e117

drm/amdgpu: block scheduler when gpu reset · 0875dc9e

由 Chunming Zhou 提交于 6月 12, 2016

Signed-off-by: NChunming Zhou <David1.Zhou@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

0875dc9e

drm/amdgpu: stop trying to schedule() with a spin held · a8bd3e1c

由 Christian König 提交于 6月 13, 2016

Drop the lock before calling cancel_delayed_work_sync().

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96445Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NChristian König <christian.koenig@amd.com>
Tested-by: NNicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>

a8bd3e1c