- 29 1月, 2021 1 次提交
-
-
由 Luben Tuikov 提交于
This patch does not change current behaviour. The driver's job timeout handler now returns status indicating back to the DRM layer whether the device (GPU) is no longer available, such as after it's been unplugged, or whether all is normal, i.e. current behaviour. All drivers which make use of the drm_sched_backend_ops' .timedout_job() callback have been accordingly renamed and return the would've-been default value of DRM_GPU_SCHED_STAT_NOMINAL to restart the task's timeout timer--this is the old behaviour, and is preserved by this patch. v2: Use enum as the status of a driver's job timeout callback method. v3: Return scheduler/device information, rather than task information. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Eric Anholt <eric@anholt.net> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Acked-by: NAlyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: NChristian König <christian.koenig@amd.com> Acked-by: NSteven Price <steven.price@arm.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/415095/
-
- 19 1月, 2021 1 次提交
-
-
由 Andrey Grodzovsky 提交于
To avoid any possible use after free. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/414814/ CC: stable@vger.kernel.org Signed-off-by: NChristian König <christian.koenig@amd.com>
-
- 08 12月, 2020 3 次提交
-
-
由 Luben Tuikov 提交于
The job done callback is called from various places, in two ways: in job done role, and as a fence callback role. Essentialize the callback to an atom function to just complete the job, and into a second function as a prototype of fence callback which calls to complete the job. This is used in latter patches by the completion code. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/405574/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChristian König <christian.koenig@amd.com>
-
由 Luben Tuikov 提交于
Rename "ring_mirror_list" to "pending_list", to describe what something is, not what it does, how it's used, or how the hardware implements it. This also abstracts the actual hardware implementation, i.e. how the low-level driver communicates with the device it drives, ring, CAM, etc., shouldn't be exposed to DRM. The pending_list keeps jobs submitted, which are out of our control. Usually this means they are pending execution status in hardware, but the latter definition is a more general (inclusive) definition. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Acked-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/405573/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChristian König <christian.koenig@amd.com>
-
由 Luben Tuikov 提交于
Rename "node" to "list" in struct drm_sched_job, in order to make it consistent with what we see being used throughout gpu_scheduler.h, for instance in struct drm_sched_entity, as well as the rest of DRM and the kernel. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/403515/ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NChristian König <christian.koenig@amd.com>
-
- 17 11月, 2020 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Some identifiers have different names between their prototypes and the kernel-doc markup. Others need to be fixed, as kernel-doc markups should use this format: identifier - description Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: NJani Nikula <jani.nikula@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/12d4ca26f6843618200529ce5445063734d38c04.1605521731.git.mchehab+huawei@kernel.org
-
- 13 11月, 2020 1 次提交
-
-
由 Lee Jones 提交于
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_main.c:74: warning: Function parameter or member 'sched' not described in 'drm_sched_rq_init' Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 19 8月, 2020 1 次提交
-
-
由 Luben Tuikov 提交于
Remove DRM_SCHED_PRIORITY_LOW, as it was used in only one place. Rename and separate by a line DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT as it represents a (total) count of said priorities and it is used as such in loops throughout the code. (0-based indexing is the the count number.) Remove redundant word HIGH in priority names, and rename *KERNEL* to *HIGH*, as it really means that, high. v2: Add back KERNEL and remove SW and HW, in lieu of a single HIGH between NORMAL and KERNEL. Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 6月, 2020 1 次提交
-
-
由 Nirmoy Das 提交于
This patch uses score to select a new drm scheduler for better loadbalance between multiple drm schedulers instead of num_jobs. Below are test results after running amdgpu_test for ~10 times. Before this patch: sched_name num of many times it got schedule ========= ================================== sdma0 1463 sdma1 198 comp_1.0.1 280 After this patch: sched_name num of many times it got schedule ========= ================================== sdma0 925 sdma1 928 comp_1.0.1 177 comp_1.1.1 44 comp_1.2.1 43 comp_1.3.1 44 Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/373000/Signed-off-by: NChristian König <christian.koenig@amd.com>
-
- 15 6月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
Because SCHED_FIFO is a broken scheduler model (see previous patches) take away the priority field, the kernel can't possibly make an informed decision. In this case, use fifo_low, because it only cares about being above SCHED_NORMAL. Effectively no change in behaviour. Cc: alexander.deucher@amd.com Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NIngo Molnar <mingo@kernel.org>
-
- 15 4月, 2020 1 次提交
-
-
由 Christian König 提交于
We are racing to initialize sched->thread here, just always check the current thread. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NKent Russell <kent.russell@amd.com> Link: https://patchwork.freedesktop.org/patch/361303/
-
- 26 3月, 2020 2 次提交
-
-
由 Yintian Tao 提交于
There is one one corner case at dma_fence_signal_locked which will raise the NULL pointer problem just like below. ->dma_fence_signal ->dma_fence_signal_locked ->test_and_set_bit here trigger dma_fence_release happen due to the zero of fence refcount. ->dma_fence_put ->dma_fence_release ->drm_sched_fence_release_scheduled ->call_rcu here make the union fled “cb_list” at finished fence to NULL because struct rcu_head contains two pointer which is same as struct list_head cb_list Therefore, to hold the reference of finished fence at drm_sched_process_job to prevent the null pointer during finished fence dma_fence_signal [ 732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 732.914815] #PF: supervisor write access in kernel mode [ 732.915731] #PF: error_code(0x0002) - not-present page [ 732.916621] PGD 0 P4D 0 [ 732.917072] Oops: 0002 [#1] SMP PTI [ 732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.0-rc7 #1 [ 732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100 [ 732.938569] Call Trace: [ 732.939003] <IRQ> [ 732.939364] dma_fence_signal+0x29/0x50 [ 732.940036] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [ 732.940996] drm_sched_process_job+0x34/0xa0 [gpu_sched] [ 732.941910] dma_fence_signal_locked+0x85/0x100 [ 732.942692] dma_fence_signal+0x29/0x50 [ 732.943457] amdgpu_fence_process+0x99/0x120 [amdgpu] [ 732.944393] sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu] v2: hold the finished fence at drm_sched_process_job instead of amdgpu_fence_process v3: resume the blank line Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Yintian Tao 提交于
There is one one corner case at dma_fence_signal_locked which will raise the NULL pointer problem just like below. ->dma_fence_signal ->dma_fence_signal_locked ->test_and_set_bit here trigger dma_fence_release happen due to the zero of fence refcount. ->dma_fence_put ->dma_fence_release ->drm_sched_fence_release_scheduled ->call_rcu here make the union fled “cb_list” at finished fence to NULL because struct rcu_head contains two pointer which is same as struct list_head cb_list Therefore, to hold the reference of finished fence at drm_sched_process_job to prevent the null pointer during finished fence dma_fence_signal [ 732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 732.914815] #PF: supervisor write access in kernel mode [ 732.915731] #PF: error_code(0x0002) - not-present page [ 732.916621] PGD 0 P4D 0 [ 732.917072] Oops: 0002 [#1] SMP PTI [ 732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.0-rc7 #1 [ 732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100 [ 732.938569] Call Trace: [ 732.939003] <IRQ> [ 732.939364] dma_fence_signal+0x29/0x50 [ 732.940036] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [ 732.940996] drm_sched_process_job+0x34/0xa0 [gpu_sched] [ 732.941910] dma_fence_signal_locked+0x85/0x100 [ 732.942692] dma_fence_signal+0x29/0x50 [ 732.943457] amdgpu_fence_process+0x99/0x120 [amdgpu] [ 732.944393] sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu] v2: hold the finished fence at drm_sched_process_job instead of amdgpu_fence_process v3: resume the blank line Signed-off-by: NYintian Tao <yttao@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 3月, 2020 2 次提交
-
-
由 Nirmoy Das 提交于
Remove drm_sched_entity_get_free_sched() and use the logic of picking the least loaded drm scheduler from a drm scheduler list to implement drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so that it can be utilized by other drm drivers. Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 changzhu 提交于
It needs to revert this patch to avoid amdgpu_test compute hang problem on picasso. This reverts commit 56822db1. Signed-off-by: Nchangzhu <Changfeng.Zhu@amd.com> Reviewed-by: NFeifei Xu <Feifei.Xu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 13 3月, 2020 2 次提交
-
-
由 Lucas Stach 提交于
1db8c142 (drm/scheduler: Add drm_sched_suspend/resume_timeout()) made the job_list_lock IRQ safe in as the suspend/resume calls were expected to be called from IRQ context. This usage never materialized in upstream. Instead amdgpu started locking the job_list_lock in an IRQ unsafe way in amdgpu_ib_preempt_mark_partial_job() and amdgpu_ib_preempt_job_recovery(), which leads to potential deadlock if one would actually start to call the drm_sched_suspend/resume_timeout functions from IRQ context. As no current user needs the locking to be IRQ safe, the local IRQ disable/enable is pure overhead. Fix the inconsistent locking by changing all uses of job_list_lock to use the IRQ unsafe locking primitives. Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Robert Beckett 提交于
Add a new trace event to show when jobs are run on the HW. Acked-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NRobert Beckett <bob.beckett@collabora.com> Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 17 1月, 2020 1 次提交
-
-
由 Nirmoy Das 提交于
This patch uses score based logic to select a new rq for better loadbalance between multiple rq/scheds instead of num_jobs. Below are test results after running amdgpu_test from mesa drm Before this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 314 sdma1 32 comp_1.0.0 56 comp_1.0.1 0 comp_1.1.0 0 comp_1.1.1 0 comp_1.2.0 0 comp_1.2.1 0 comp_1.3.0 0 comp_1.3.1 0 After this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 216 sdma1 185 comp_1.0.0 39 comp_1.0.1 9 comp_1.1.0 12 comp_1.1.1 0 comp_1.2.0 12 comp_1.2.1 0 comp_1.3.0 12 comp_1.3.1 0 Signed-off-by: NNirmoy Das <nirmoy.das@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 11月, 2019 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Problem: Due to a race between drm_sched_cleanup_jobs in sched thread and drm_sched_job_timedout in timeout work there is a possiblity that bad job was already freed while still being accessed from the timeout thread. Fix: Instead of just peeking at the bad job in the mirror list remove it from the list under lock and then put it back later when we are garanteed no race with main sched thread is possible which is after the thread is parked. v2: Lock around processing ring_mirror_list in drm_sched_cleanup_jobs. v3: Rebase on top of drm-misc-next. v2 is not needed anymore as drm_sched_get_cleanup_job already has a lock there. v4: Fix comments to relfect latest code in drm-misc. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Tested-by: NEmily Deng <Emily.Deng@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/342356
-
- 08 11月, 2019 2 次提交
-
-
由 Andrey Grodzovsky 提交于
When the sched thread is parked we assume ring_mirror_list is not accessed from here. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Removes thread park/unpark hack from drm_sched_entity_fini and by this fixes reactivation of scheduler thread while the thread is supposed to be stopped. v2: Per sched entity completion. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 07 11月, 2019 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Fix a static code checker warning. v2: Drop PTR_ERR_OR_ZERO. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NEmily Deng <Emily.Deng@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 30 10月, 2019 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Problem: When run_job fails and HW fence returned is NULL we still signal the s_fence to avoid hangs but the user has no way of knowing if the actual HW job was ran and finished. Fix: Allow .run_job implementations to return ERR_PTR in the fence pointer returned and then set this error for s_fence->finished fence so whoever wait on this fence can inspect the signaled fence for an error. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 10月, 2019 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Problem: When run_job fails and HW fence returned is NULL we still signal the s_fence to avoid hangs but the user has no way of knowing if the actual HW job was ran and finished. Fix: Allow .run_job implementations to return ERR_PTR in the fence pointer returned and then set this error for s_fence->finished fence so whoever wait on this fence can inspect the signaled fence for an error. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 25 10月, 2019 1 次提交
-
-
由 Steven Price 提交于
drm_sched_cleanup_jobs() attempts to free finished jobs, however because it is called as the condition of wait_event_interruptible() it must not sleep. Unfortunately some free callbacks (notably for Panfrost) do sleep. Instead let's rename drm_sched_cleanup_jobs() to drm_sched_get_cleanup_job() and simply return a job for processing if there is one. The caller can then call the free_job() callback outside the wait_event_interruptible() where sleeping is possible before re-checking and returning to sleep if necessary. Tested-by: NChristian Gmeiner <christian.gmeiner@gmail.com> Fixes: 5918045c ("drm/scheduler: rework job destruction") Signed-off-by: NSteven Price <steven.price@arm.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/337652/
-
- 16 7月, 2019 1 次提交
-
-
由 Sam Ravnborg 提交于
Drop use of the deprecated drmP.h header file. Fix fallout. Signed-off-by: NSam Ravnborg <sam@ravnborg.org> Reviewed-by: NChristian König <christian.koenig@amd.com> Acked-by: NEmil Velikov <emil.velikov@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Eric Anholt <eric@anholt.net> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Sharat Masetty <smasetty@codeaurora.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190630061922.7254-26-sam@ravnborg.org
-
- 30 5月, 2019 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Document the missing parameters. Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1559140180-6762-1-git-send-email-andrey.grodzovsky@amd.com
-
- 25 5月, 2019 1 次提交
-
-
由 Andrey Grodzovsky 提交于
Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1558533443-7795-1-git-send-email-andrey.grodzovsky@amd.com
-
- 22 5月, 2019 1 次提交
-
-
由 Erico Nunes 提交于
After "5918045c drm/scheduler: rework job destruction", jobs are only deleted when the timeout handler is able to be cancelled successfully. In case no timeout handler is running (timeout == MAX_SCHEDULE_TIMEOUT), job cleanup would be skipped which may result in memory leaks. Add the handling for the (timeout == MAX_SCHEDULE_TIMEOUT) case in drm_sched_cleanup_jobs. Signed-off-by: NErico Nunes <nunes.erico@gmail.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/306025/?series=60878&rev=2
-
- 03 5月, 2019 3 次提交
-
-
由 Andrey Grodzovsky 提交于
Problem: Sched thread's cleanup function races against TO handler and removes the guilty job from mirror list and we have no way of differentiating if the job was removed from within the TO handler or from the sched thread's clean-up function. Fix: Add a flag to scheduler to hint the TO handler that the guilty job needs to be explicitly released. v2: whitespace fix Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-5-git-send-email-andrey.grodzovsky@amd.com
-
由 Andrey Grodzovsky 提交于
For later driver's reference to see if the fence is signaled. v2: Move parent fence put to resubmit jobs. Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-4-git-send-email-andrey.grodzovsky@amd.com
-
由 Christian König 提交于
We now destroy finished jobs from the worker thread to make sure that we never destroy a job currently in timeout processing. By this we avoid holding lock around ring mirror list in drm_sched_stop which should solve a deadlock reported by a user. v2: Remove unused variable. v4: Move guilty job free into sched code. v5: Move sched->hw_rq_count to drm_sched_start to account for counter decrement in drm_sched_stop even when we don't call resubmit jobs if guily job did signal. v6: remove unused variable Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692Acked-by: NChunming Zhou <david1.zhou@amd.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/1555599624-12285-3-git-send-email-andrey.grodzovsky@amd.com
-
- 24 4月, 2019 1 次提交
-
-
由 Jonathan Neuschäfer 提交于
Since commit 222b5f04 ("drm/sched: Refactor ring mirror list handling."), drm_sched_hw_job_reset is no longer there, so let's adjust the doc comment accordingly. Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NJonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 26 1月, 2019 2 次提交
-
-
由 Andrey Grodzovsky 提交于
Expedite job deletion from ring mirror list to the HW fence signal callback instead from finish_work, together with waiting for all such fences to signal in drm_sched_stop we garantee that already signaled job will not be processed twice. Remove the sched finish fence callback and just submit finish_work directly from the HW fence callback. v2: Fix comments. v3: Attach hw fence cb to sched_job v5: Rebase Suggested-by: NChristian Koenig <Christian.Koenig@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Andrey Grodzovsky 提交于
Decauple sched threads stop and start and ring mirror list handling from the policy of what to do about the guilty jobs. When stoppping the sched thread and detaching sched fences from non signaled HW fenes wait for all signaled HW fences to complete before rerunning the jobs. v2: Fix resubmission of guilty job into HW after refactoring. v4: Full restart for all the jobs, not only from guilty ring. Extract karma increase into standalone function. v5: Rework waiting for signaled jobs without relying on the job struct itself as those might already be freed for non 'guilty' job's schedulers. Expose karma increase to drivers. v6: Use list_for_each_entry_safe_continue and drm_sched_process_job in case fence already signaled. Call drm_sched_increase_karma only once for amdgpu and add documentation. v7: Wait only for the latest job's fence. Suggested-by: NChristian Koenig <Christian.Koenig@amd.com> Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 12月, 2018 2 次提交
-
-
由 Sharat Masetty 提交于
This patch adds two new functions to help client drivers suspend and resume the scheduler job timeout. This can be useful in cases where the hardware has preemption support enabled. Using this, it is possible to have the timeout active only for the ring which is active on the ringbuffer. This patch also makes the job_list_lock IRQ safe. Suggested-by: NChristian Koenig <Christian.Koenig@amd.com> Signed-off-by: NSharat Masetty <smasetty@codeaurora.org> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Sharat Masetty 提交于
In cases where the scheduler instance is used as a base object of another driver object, it's not clear if the driver can call scheduler cleanup on the fail path. So, Set the sched->thread to NULL, so that the driver can safely call drm_sched_fini() during cleanup. Signed-off-by: NSharat Masetty <smasetty@codeaurora.org> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 29 11月, 2018 1 次提交
-
-
由 Christian König 提交于
This reverts commit 0efd2d2f. It's still causing problems for V3D. v2: keep rearming the timeout. Signed-off-by: NChristian König <christian.koenig@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 20 11月, 2018 1 次提交
-
-
由 Trigger Huang 提交于
A bad job is the one triggered TDR(In the current amdgpu's implementation, actually all the jobs in the current joq-queue will be treated as bad jobs). In the recovery process, its fence will be fake signaled and as a result, the work behind will be scheduled to delete it from the mirror list, but if the TDR process is invoked before the work's execution, then this bad job might be processed again and the call dma_fence_set_error to its fence in TDR process will lead to kernel warning trace: [ 143.033605] WARNING: CPU: 2 PID: 53 at ./include/linux/dma-fence.h:437 amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] kernel: [ 143.033606] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 snd_hda_codec_generic crypto_simd glue_helper cryptd snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev snd_seq_device snd_timer snd soundcore binfmt_misc input_leds mac_hid serio_raw nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 8139too floppy psmouse 8139cp mii i2c_piix4 pata_acpi [ 143.033649] CPU: 2 PID: 53 Comm: kworker/2:1 Tainted: G OE 4.15.0-20-generic #21-Ubuntu [ 143.033650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 143.033653] Workqueue: events drm_sched_job_timedout [amd_sched] [ 143.033656] RIP: 0010:amddrm_sched_job_recovery+0x1af/0x1c0 [amd_sched] [ 143.033657] RSP: 0018:ffffa9f880fe7d48 EFLAGS: 00010202 [ 143.033659] RAX: 0000000000000007 RBX: ffff9b98f2b24c00 RCX: ffff9b98efef4f08 [ 143.033660] RDX: ffff9b98f2b27400 RSI: ffff9b98f2b24c50 RDI: ffff9b98efef4f18 [ 143.033660] RBP: ffffa9f880fe7d98 R08: 0000000000000001 R09: 00000000000002b6 [ 143.033661] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9b98efef3430 [ 143.033662] R13: ffff9b98efef4d80 R14: ffff9b98efef4e98 R15: ffff9b98eaf91c00 [ 143.033663] FS: 0000000000000000(0000) GS:ffff9b98ffd00000(0000) knlGS:0000000000000000 [ 143.033664] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 143.033665] CR2: 00007fc49c96d470 CR3: 000000001400a005 CR4: 00000000003606e0 [ 143.033669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 143.033669] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 143.033670] Call Trace: [ 143.033744] amdgpu_device_gpu_recover+0x144/0x820 [amdgpu] [ 143.033788] amdgpu_job_timedout+0x9b/0xa0 [amdgpu] [ 143.033791] drm_sched_job_timedout+0xcc/0x150 [amd_sched] [ 143.033795] process_one_work+0x1de/0x410 [ 143.033797] worker_thread+0x32/0x410 [ 143.033799] kthread+0x121/0x140 [ 143.033801] ? process_one_work+0x410/0x410 [ 143.033803] ? kthread_create_worker_on_cpu+0x70/0x70 [ 143.033806] ret_from_fork+0x35/0x40 So just delete the bad job from mirror list directly Changes in v3: - Add a helper function to delete the bad jobs from mirror list and call it directly *before* the job's fence is signaled Changes in v2: - delete the useless list node check - also delete bad jobs in drm_sched_main because: kthread_unpark(ring->sched.thread) will be invoked very early before amdgpu_device_gpu_recover's return, then drm_sched_main will have chance to pick up a new job from the job queue. This new job will be added into the mirror list and processed by amdgpu_job_run, but may not be deleted from the mirror list on time due to the same reason. And finally re-processed by drm_sched_job_recovery Signed-off-by: NTrigger Huang <Trigger.Huang@amd.com> Reviewed-by: NChristian König <chrstian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 06 11月, 2018 1 次提交
-
-
由 Sharat Masetty 提交于
This patch adds a new API to clean up the scheduler job resources. This is primarliy needed in cases the job was created but was not queued to the scheduler queue. Additionally with this change, the layer which creates the scheduler job also gets to free up the job's resources and this entails moving the dma_fence_put(finished_fence) to the drivers ops free handler routines. Signed-off-by: NSharat Masetty <smasetty@codeaurora.org> Reviewed-by: NChristian König <christian.koenig@amd.com> Acked-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-