1. 02 1月, 2023 1 次提交
  2. 16 11月, 2022 1 次提交
  3. 03 11月, 2022 2 次提交
  4. 25 10月, 2022 1 次提交
    • L
      drm/scheduler: Set the FIFO scheduling policy as the default · 977d97f1
      Luben Tuikov 提交于
      The currently default Round-Robin GPU scheduling can result in starvation
      of entities which have a large number of jobs, over entities which have
      a very small number of jobs (single digit).
      
      This can be illustrated in the following diagram, where jobs are
      alphabetized to show their chronological order of arrival, where job A is
      the oldest, B is the second oldest, and so on, to J, the most recent job to
      arrive.
      
          ---> entities
      j | H-F-----A--E--I--
      o | --G-----B-----J--
      b | --------C--------
      s\/ --------D--------
      
      WLOG, assuming all jobs are "ready", then a R-R scheduling will execute them
      in the following order (a slice off of the top of the entities' list),
      
      H, F, A, E, I, G, B, J, C, D.
      
      However, to mitigate job starvation, we'd rather execute C and D before E,
      and so on, given, of course, that they're all ready to be executed.
      
      So, if all jobs are ready at this instant, the order of execution for this
      and the next 9 instances of picking the next job to execute, should really
      be,
      
      A, B, C, D, E, F, G, H, I, J,
      
      which is their chronological order. The only reason for this order to be
      broken, is if an older job is not yet ready, but a younger job is ready, at
      an instant of picking a new job to execute. For instance if job C wasn't
      ready at time 2, but job D was ready, then we'd pick job D, like this:
      
      0 +1 +2  ...
      A, B, D, ...
      
      And from then on, C would be preferred before all other jobs, if it is ready
      at the time when a new job for execution is picked. So, if C became ready
      two steps later, the execution order would look like this:
      
      ......0 +1 +2  ...
      A, B, D, E, C, F, G, H, I, J
      
      This is what the FIFO GPU scheduling algorithm achieves. It uses a
      Red-Black tree to keep jobs sorted in chronological order, where picking
      the oldest job is O(1) (we use the "cached" structure), and balancing the
      tree is O(log n). IOW, it picks the *oldest ready* job to execute now.
      
      The implementation is already in the kernel, and this commit only changes
      the default GPU scheduling algorithm to use.
      
      This was tested and achieves about 1% faster performance over the Round
      Robin algorithm.
      
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Alex Deucher <Alexander.Deucher@amd.com>
      Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
      Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20221024212634.27230-1-luben.tuikov@amd.comSigned-off-by: NChristian König <christian.koenig@amd.com>
      977d97f1
  5. 07 10月, 2022 1 次提交
    • D
      Revert "drm/sched: Use parent fence instead of finished" · bafaf67c
      Dave Airlie 提交于
      This reverts commit e4dc45b1.
      
      This is causing instability on Linus' desktop, and I'm seeing
      oops with VK CTS runs.
      
      netconsole got me the following oops:
      [ 1234.778760] BUG: kernel NULL pointer dereference, address: 0000000000000088
      [ 1234.778782] #PF: supervisor read access in kernel mode
      [ 1234.778787] #PF: error_code(0x0000) - not-present page
      [ 1234.778791] PGD 0 P4D 0
      [ 1234.778798] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [ 1234.778803] CPU: 7 PID: 805 Comm: systemd-journal Not tainted 6.0.0+ #2
      [ 1234.778809] Hardware name: System manufacturer System Product
      Name/PRIME X370-PRO, BIOS 5603 07/28/2020
      [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]
      [ 1234.778828] Code: aa 0f 1d ce e9 57 ff ff ff 48 89 d7 e8 9d 8f 3f
      ce e9 4a ff ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 53
      48 89 fb <48> 8b af 88 00 00 00 f0 ff 8d f0 00 00 00 48 8b 85 80 01 00
      00 f0
      [ 1234.778834] RSP: 0000:ffffabe680380de0 EFLAGS: 00010087
      [ 1234.778839] RAX: ffffffffc04e9230 RBX: 0000000000000000 RCX: 0000000000000018
      [ 1234.778897] RDX: 00000ba278e8977a RSI: ffff953fb288b460 RDI: 0000000000000000
      [ 1234.778901] RBP: ffff953fb288b598 R08: 00000000000000e0 R09: ffff953fbd98b808
      [ 1234.778905] R10: 0000000000000000 R11: ffffabe680380ff8 R12: ffffabe680380e00
      [ 1234.778908] R13: 0000000000000001 R14: 00000000ffffffff R15: ffff953fbd9ec458
      [ 1234.778912] FS:  00007f35e7008580(0000) GS:ffff95428ebc0000(0000)
      knlGS:0000000000000000
      [ 1234.778916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1234.778919] CR2: 0000000000000088 CR3: 000000010147c000 CR4: 00000000003506e0
      [ 1234.778924] Call Trace:
      [ 1234.778981]  <IRQ>
      [ 1234.778989]  dma_fence_signal_timestamp_locked+0x6a/0xe0
      [ 1234.778999]  dma_fence_signal+0x2c/0x50
      [ 1234.779005]  amdgpu_fence_process+0xc8/0x140 [amdgpu]
      [ 1234.779234]  sdma_v3_0_process_trap_irq+0x70/0x80 [amdgpu]
      [ 1234.779395]  amdgpu_irq_dispatch+0xa9/0x1d0 [amdgpu]
      [ 1234.779609]  amdgpu_ih_process+0x80/0x100 [amdgpu]
      [ 1234.779783]  amdgpu_irq_handler+0x1f/0x60 [amdgpu]
      [ 1234.779940]  __handle_irq_event_percpu+0x46/0x190
      [ 1234.779946]  handle_irq_event+0x34/0x70
      [ 1234.779949]  handle_edge_irq+0x9f/0x240
      [ 1234.779954]  __common_interrupt+0x66/0x100
      [ 1234.779960]  common_interrupt+0xa0/0xc0
      [ 1234.779965]  </IRQ>
      [ 1234.779968]  <TASK>
      [ 1234.779971]  asm_common_interrupt+0x22/0x40
      [ 1234.779976] RIP: 0010:finish_mkwrite_fault+0x22/0x110
      [ 1234.779981] Code: 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 41 55 41
      54 55 48 89 fd 53 48 8b 07 f6 40 50 08 0f 84 eb 00 00 00 48 8b 45 30
      48 8b 18 <48> 89 df e8 66 bd ff ff 48 85 c0 74 0d 48 89 c2 83 e2 01 48
      83 ea
      [ 1234.779985] RSP: 0000:ffffabe680bcfd78 EFLAGS: 00000202
      
      Revert it for now and figure it out later.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      bafaf67c
  6. 05 10月, 2022 1 次提交
  7. 30 9月, 2022 1 次提交
    • A
      drm/sched: Add FIFO sched policy to run queue · 08fb97de
      Andrey Grodzovsky 提交于
      When many entities are competing for the same run queue
      on the same scheduler, we observe an unusually long wait
      times and some jobs get starved. This has been observed on GPUVis.
      
      The issue is due to the Round Robin policy used by schedulers
      to pick up the next entity's job queue for execution. Under stress
      of many entities and long job queues within entity some
      jobs could be stuck for very long time in it's entity's
      queue before being popped from the queue and executed
      while for other entities with smaller job queues a job
      might execute earlier even though that job arrived later
      then the job in the long queue.
      
      Fix:
      Add FIFO selection policy to entities in run queue, chose next entity
      on run queue in such order that if job on one entity arrived
      earlier then job on another entity the first job will start
      executing earlier regardless of the length of the entity's job
      queue.
      
      v2:
      Switch to rb tree structure for entities based on TS of
      oldest job waiting in the job queue of an entity. Improves next
      entity extraction to O(1). Entity TS update
      O(log N) where N is the number of entities in the run-queue
      
      Drop default option in module control parameter.
      
      v3:
      Various cosmetical fixes and minor refactoring of fifo update function. (Luben)
      
      v4:
      Switch drm_sched_rq_select_entity_fifo to in order search (Luben)
      
      v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben)
      
      v6: Add missing drm_sched_rq_remove_fifo_locked
      
      v7: Fix ts sampling bug and more cosmetic stuff (Luben)
      
      v8: Fix module parameter string (Luben)
      
      Cc: Luben Tuikov <luben.tuikov@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org>
      Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
      Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Tested-by: NYunxiang Li (Teddy) <Yunxiang.Li@amd.com>
      Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
      Reviewed-by: NLuben Tuikov <luben.tuikov@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220930041258.1050247-1-luben.tuikov@amd.com
      08fb97de
  8. 16 9月, 2022 1 次提交
  9. 07 9月, 2022 1 次提交
    • R
      drm/scheduler: quieten kernel-doc warnings · f8ad757e
      Randy Dunlap 提交于
      Fix kernel-doc warnings in gpu_scheduler.h and sched_main.c.
      
      Quashes these warnings:
      
      include/drm/gpu_scheduler.h:332: warning: missing initial short description on line:
       * struct drm_sched_backend_ops
      include/drm/gpu_scheduler.h:412: warning: missing initial short description on line:
       * struct drm_gpu_scheduler
      include/drm/gpu_scheduler.h:461: warning: Function parameter or member 'dev' not described in 'drm_gpu_scheduler'
      
      drivers/gpu/drm/scheduler/sched_main.c:201: warning: missing initial short description on line:
       * drm_sched_dependency_optimized
      drivers/gpu/drm/scheduler/sched_main.c:995: warning: Function parameter or member 'dev' not described in 'drm_sched_init'
      
      Fixes: 2d33948e ("drm/scheduler: add documentation")
      Fixes: 8ab62eda ("drm/sched: Add device pointer to drm_gpu_scheduler")
      Fixes: 542cff78 ("drm/sched: Avoid lockdep spalt on killing a processes")
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
      Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Jiawei Gu <Jiawei.Gu@amd.com>
      Cc: dri-devel@lists.freedesktop.org
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220404213040.12912-1-rdunlap@infradead.org
      f8ad757e
  10. 19 7月, 2022 1 次提交
  11. 28 6月, 2022 1 次提交
  12. 07 4月, 2022 1 次提交
  13. 04 4月, 2022 1 次提交
  14. 23 2月, 2022 1 次提交
  15. 17 11月, 2021 1 次提交
  16. 16 11月, 2021 1 次提交
  17. 19 10月, 2021 1 次提交
  18. 07 10月, 2021 1 次提交
  19. 15 9月, 2021 1 次提交
    • M
      drm/sched: fix the bug of time out calculation(v4) · bcf26654
      Monk Liu 提交于
      issue:
      in cleanup_job the cancle_delayed_work will cancel a TO timer
      even the its corresponding job is still running.
      
      fix:
      do not cancel the timer in cleanup_job, instead do the cancelling
      only when the heading job is signaled, and if there is a "next" job
      we start_timeout again.
      
      v2:
      further cleanup the logic, and do the TDR timer cancelling if the signaled job
      is the last one in its scheduler.
      
      v3:
      change the issue description
      remove the cancel_delayed_work in the begining of the cleanup_job
      recover the implement of drm_sched_job_begin.
      
      v4:
      remove the kthread_should_park() checking in cleanup_job routine,
      we should cleanup the signaled job asap
      
      TODO:
      1)introduce pause/resume scheduler in job_timeout to serial the handling
      of scheduler and job_timeout.
      2)drop the bad job's del and insert in scheduler due to above serialization
      (no race issue anymore with the serialization)
      
      Tested-by: jingwen <jingwen.chen@@amd.com>
      Signed-off-by: NMonk Liu <Monk.Liu@amd.com>
      Signed-off-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/1630457207-13107-1-git-send-email-Monk.Liu@amd.com
      bcf26654
  20. 07 9月, 2021 1 次提交
  21. 30 8月, 2021 2 次提交
    • D
      drm/sched: Add dependency tracking · ebd5f742
      Daniel Vetter 提交于
      Instead of just a callback we can just glue in the gem helpers that
      panfrost, v3d and lima currently use. There's really not that many
      ways to skin this cat.
      
      v2/3: Rebased.
      
      v4: Repaint this shed. The functions are now called _add_dependency()
      and _add_implicit_dependency()
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v3)
      Reviewed-by: Steven Price <steven.price@arm.com> (v1)
      Acked-by: NMelissa Wen <mwen@igalia.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
      Cc: Lee Jones <lee.jones@linaro.org>
      Cc: Nirmoy Das <nirmoy.aiemd@gmail.com>
      Cc: Boris Brezillon <boris.brezillon@collabora.com>
      Cc: Luben Tuikov <luben.tuikov@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-5-daniel.vetter@ffwll.ch
      ebd5f742
    • D
      drm/sched: Split drm_sched_job_init · dbe48d03
      Daniel Vetter 提交于
      This is a very confusingly named function, because not just does it
      init an object, it arms it and provides a point of no return for
      pushing a job into the scheduler. It would be nice if that's a bit
      clearer in the interface.
      
      But the real reason is that I want to push the dependency tracking
      helpers into the scheduler code, and that means drm_sched_job_init
      must be called a lot earlier, without arming the job.
      
      v2:
      - don't change .gitignore (Steven)
      - don't forget v3d (Emma)
      
      v3: Emma noticed that I leak the memory allocated in
      drm_sched_job_init if we bail out before the point of no return in
      subsequent driver patches. To be able to fix this change
      drm_sched_job_cleanup() so it can handle being called both before and
      after drm_sched_job_arm().
      
      Also improve the kerneldoc for this.
      
      v4:
      - Fix the drm_sched_job_cleanup logic, I inverted the booleans, as
        usual (Melissa)
      
      - Christian pointed out that drm_sched_entity_select_rq() also needs
        to be moved into drm_sched_job_arm, which made me realize that the
        job->id definitely needs to be moved too.
      
        Shuffle things to fit between job_init and job_arm.
      
      v5:
      Reshuffle the split between init/arm once more, amdgpu abuses
      drm_sched.ready to signal gpu reset failures. Also document this
      somewhat. (Christian)
      
      v6:
      Rebase on top of the msm drm/sched support. Note that the
      drm_sched_job_init() call is completely misplaced, and hence also the
      split-out drm_sched_entity_push_job(). I've put in a FIXME which the next
      patch will address.
      
      v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't
      be a problem where it is now.
      Acked-by: NChristian König <christian.koenig@amd.com>
      Acked-by: NMelissa Wen <mwen@igalia.com>
      Cc: Melissa Wen <melissa.srw@gmail.com>
      Acked-by: NEmma Anholt <emma@anholt.net>
      Acked-by: Steven Price <steven.price@arm.com> (v2)
      Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v5)
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Cc: Lucas Stach <l.stach@pengutronix.de>
      Cc: Russell King <linux+etnaviv@armlinux.org.uk>
      Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
      Cc: Qiang Yu <yuq825@gmail.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: "Christian König" <christian.koenig@amd.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Adam Borowski <kilobyte@angband.pl>
      Cc: Nick Terrell <terrelln@fb.com>
      Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Cc: Paul Menzel <pmenzel@molgen.mpg.de>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Nirmoy Das <nirmoy.das@amd.com>
      Cc: Deepak R Varma <mh12gx2825@gmail.com>
      Cc: Lee Jones <lee.jones@linaro.org>
      Cc: Kevin Wang <kevin1.wang@amd.com>
      Cc: Chen Li <chenli@uniontech.com>
      Cc: Luben Tuikov <luben.tuikov@amd.com>
      Cc: "Marek Olšák" <marek.olsak@amd.com>
      Cc: Dennis Li <Dennis.Li@amd.com>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
      Cc: Sonny Jiang <sonny.jiang@amd.com>
      Cc: Boris Brezillon <boris.brezillon@collabora.com>
      Cc: Tian Tao <tiantao6@hisilicon.com>
      Cc: etnaviv@lists.freedesktop.org
      Cc: lima@lists.freedesktop.org
      Cc: linux-media@vger.kernel.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: Emma Anholt <emma@anholt.net>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Sean Paul <sean@poorly.run>
      Cc: linux-arm-msm@vger.kernel.org
      Cc: freedreno@lists.freedesktop.org
      Link: https://patchwork.freedesktop.org/patch/msgid/20210817084917.3555822-1-daniel.vetter@ffwll.ch
      dbe48d03
  22. 01 7月, 2021 1 次提交
  23. 28 6月, 2021 1 次提交
  24. 20 5月, 2021 2 次提交
  25. 05 5月, 2021 1 次提交
  26. 10 4月, 2021 1 次提交
    • J
      drm/amd/amdgpu implement tdr advanced mode · e6c6338f
      Jack Zhang 提交于
      [Why]
      Previous tdr design treats the first job in job_timeout as the bad job.
      But sometimes a later bad compute job can block a good gfx job and
      cause an unexpected gfx job timeout because gfx and compute ring share
      internal GC HW mutually.
      
      [How]
      This patch implements an advanced tdr mode.It involves an additinal
      synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
      step in order to find the real bad job.
      
      1. At Step0 Resubmit stage, it synchronously submits and pends for the
      first job being signaled. If it gets timeout, we identify it as guilty
      and do hw reset. After that, we would do the normal resubmit step to
      resubmit left jobs.
      
      2. For whole gpu reset(vram lost), do resubmit as the old way.
      
      v2: squash in build fix (Alex)
      Signed-off-by: NJack Zhang <Jack.Zhang1@amd.com>
      Reviewed-by: NAndrey Grodzovsky <andrey.grodzovsky@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      e6c6338f
  27. 10 2月, 2021 2 次提交
  28. 05 2月, 2021 1 次提交
  29. 29 1月, 2021 1 次提交
    • L
      drm/scheduler: Job timeout handler returns status (v3) · a6a1f036
      Luben Tuikov 提交于
      This patch does not change current behaviour.
      
      The driver's job timeout handler now returns
      status indicating back to the DRM layer whether
      the device (GPU) is no longer available, such as
      after it's been unplugged, or whether all is
      normal, i.e. current behaviour.
      
      All drivers which make use of the
      drm_sched_backend_ops' .timedout_job() callback
      have been accordingly renamed and return the
      would've-been default value of
      DRM_GPU_SCHED_STAT_NOMINAL to restart the task's
      timeout timer--this is the old behaviour, and is
      preserved by this patch.
      
      v2: Use enum as the status of a driver's job
          timeout callback method.
      
      v3: Return scheduler/device information, rather
          than task information.
      
      Cc: Alexander Deucher <Alexander.Deucher@amd.com>
      Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Lucas Stach <l.stach@pengutronix.de>
      Cc: Russell King <linux+etnaviv@armlinux.org.uk>
      Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
      Cc: Qiang Yu <yuq825@gmail.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
      Cc: Eric Anholt <eric@anholt.net>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NLuben Tuikov <luben.tuikov@amd.com>
      Acked-by: NAlyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Acked-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Link: https://patchwork.freedesktop.org/patch/415095/
      a6a1f036
  30. 19 1月, 2021 1 次提交
  31. 08 12月, 2020 3 次提交
  32. 17 11月, 2020 1 次提交
  33. 13 11月, 2020 1 次提交
  34. 19 8月, 2020 1 次提交