1. 28 5月, 2019 4 次提交
  2. 22 5月, 2019 2 次提交
    • C
      drm/i915: Allow specification of parallel execbuf · a88b6e4c
      Chris Wilson 提交于
      There is a desire to split a task onto two engines and have them run at
      the same time, e.g. scanline interleaving to spread the workload evenly.
      Through the use of the out-fence from the first execbuf, we can
      coordinate secondary execbuf to only become ready simultaneously with
      the first, so that with all things idle the second execbufs are executed
      in parallel with the first. The key difference here between the new
      EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
      waits for the completion of the first request (so that all of its
      rendering results are visible to the second execbuf, the more common
      userspace fence requirement).
      
      Since we only have a single input fence slot, userspace cannot mix an
      in-fence and a submit-fence. It has to use one or the other! This is not
      such a harsh requirement, since by virtue of the submit-fence, the
      secondary execbuf inherit all of the dependencies from the first
      request, and for the application the dependencies should be common
      between the primary and secondary execbuf.
      Suggested-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Testcase: igt/gem_exec_fence/parallel
      Link: https://github.com/intel/media-driver/pull/546Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-10-chris@chris-wilson.co.uk
      a88b6e4c
    • C
      drm/i915: Allow a context to define its set of engines · 976b55f0
      Chris Wilson 提交于
      Over the last few years, we have debated how to extend the user API to
      support an increase in the number of engines, that may be sparse and
      even be heterogeneous within a class (not all video decoders created
      equal). We settled on using (class, instance) tuples to identify a
      specific engine, with an API for the user to construct a map of engines
      to capabilities. Into this picture, we then add a challenge of virtual
      engines; one user engine that maps behind the scenes to any number of
      physical engines. To keep it general, we want the user to have full
      control over that mapping. To that end, we allow the user to constrain a
      context to define the set of engines that it can access, order fully
      controlled by the user via (class, instance). With such precise control
      in context setup, we can continue to use the existing execbuf uABI of
      specifying a single index; only now it doesn't automagically map onto
      the engines, it uses the user defined engine map from the context.
      
      v2: Fixup freeing of local on success of get_engines()
      v3: Allow empty engines[]
      v4: s/nengine/num_engines/
      v5: Replace 64 limit on num_engines with a note that execbuf is
      currently limited to only using the first 64 engines.
      v6: Actually use the engines_mutex to guard the ctx->engines.
      
      Testcase: igt/gem_ctx_engines
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk
      976b55f0
  3. 27 4月, 2019 2 次提交
  4. 25 4月, 2019 1 次提交
  5. 03 4月, 2019 1 次提交
    • P
      i915, uaccess: Fix redundant CLAC · 8f4faed0
      Peter Zijlstra 提交于
      New tooling noticed this:
      
       drivers/gpu/drm/i915/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x3c: redundant UACCESS disable
       drivers/gpu/drm/i915/i915_gem_execbuffer.o: warning: objtool: .altinstr_replacement+0x66: redundant UACCESS disable
      
      You don't need user_access_end() if user_access_begin() fails.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8f4faed0
  6. 22 3月, 2019 1 次提交
    • C
      drm/i915: Flush pages on acquisition · a679f58d
      Chris Wilson 提交于
      When we return pages to the system, we ensure that they are marked as
      being in the CPU domain since any external access is uncontrolled and we
      must assume the worst. This means that we need to always flush the pages
      on acquisition if we need to use them on the GPU, and from the beginning
      have used set-domain. Set-domain is overkill for the purpose as it is a
      general synchronisation barrier, but our intent is to only flush the
      pages being swapped in. If we move that flush into the pages acquisition
      phase, we know then that when we have obj->mm.pages, they are coherent
      with the GPU and need only maintain that status without resorting to
      heavy handed use of set-domain.
      
      The principle knock-on effect for userspace is through mmap-gtt
      pagefaulting. Our uAPI has always implied that the GTT mmap was async
      (especially as when any pagefault occurs is unpredicatable to userspace)
      and so userspace had to apply explicit domain control itself
      (set-domain). However, swapping is transparent to the kernel, and so on
      first fault we need to acquire the pages and make them coherent for
      access through the GTT. Our use of set-domain here leaks into the uABI
      that the first pagefault was synchronous. This is unintentional and
      baring a few igt should be unoticed, nevertheless we bump the uABI
      version for mmap-gtt to reflect the change in behaviour.
      
      Another implication of the change is that gem_create() is presumed to
      create an object that is coherent with the CPU and is in the CPU write
      domain, so a set-domain(CPU) following a gem_create() would be a minor
      operation that merely checked whether we could allocate all pages for
      the object. On applying this change, a set-domain(CPU) causes a clflush
      as we acquire the pages. This will have a small impact on mesa as we move
      the clflush here on !llc from execbuf time to create, but that should
      have minimal performance impact as the same clflush exists but is now
      done early and because of the clflush issue, userspace recycles bo and
      so should resist allocating fresh objects.
      
      Internally, the presumption that objects are created in the CPU
      write-domain and remain so through writes to obj->mm.mapping is more
      prevalent than I expected; but easy enough to catch and apply a manual
      flush.
      
      For the future, we should push the page flush from the central
      set_pages() into the callers so that we can more finely control when it
      is applied, but for now doing it one location is easier to validate, at
      the cost of sometimes flushing when there is no need.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Antonio Argenziano <antonio.argenziano@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
      a679f58d
  7. 08 3月, 2019 1 次提交
  8. 06 3月, 2019 1 次提交
  9. 02 3月, 2019 1 次提交
  10. 28 2月, 2019 1 次提交
  11. 22 2月, 2019 1 次提交
  12. 08 2月, 2019 1 次提交
    • C
      drm/i915: Hack and slash, throttle execbuffer hogs · d6f328bf
      Chris Wilson 提交于
      Apply backpressure to hogs that emit requests faster than the GPU can
      process them by waiting for their ring to be less than half-full before
      proceeding with taking the struct_mutex.
      
      This is a gross hack to apply throttling backpressure, the long term
      goal is to remove the struct_mutex contention so that each client
      naturally waits, preferably in an asynchronous, nonblocking fashion
      (pipelined operations for the win), for their own resources and never
      blocks another client within the driver at least. (Realtime priority
      goals would extend to ensuring that resource contention favours high
      priority clients as well.)
      
      This patch only limits excessive request production and does not attempt
      to throttle clients that block waiting for eviction (either global GTT or
      system memory) or any other global resources, see above for the long term
      goal.
      
      No microbenchmarks are harmed (to the best of my knowledge).
      
      Testcase: igt/gem_exec_schedule/pi-ringfull-*
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: John Harrison <John.C.Harrison@Intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190207071829.5574-1-chris@chris-wilson.co.uk
      d6f328bf
  13. 30 1月, 2019 1 次提交
    • C
      drm/i915: Identify active requests · 85474441
      Chris Wilson 提交于
      To allow requests to forgo a common execution timeline, one question we
      need to be able to answer is "is this request running?". To track
      whether a request has started on HW, we can emit a breadcrumb at the
      beginning of the request and check its timeline's HWSP to see if the
      breadcrumb has advanced past the start of this request. (This is in
      contrast to the global timeline where we need only ask if we are on the
      global timeline and if the timeline has advanced past the end of the
      previous request.)
      
      There is still confusion from a preempted request, which has already
      started but relinquished the HW to a high priority request. For the
      common case, this discrepancy should be negligible. However, for
      identification of hung requests, knowing which one was running at the
      time of the hang will be much more important.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
      85474441
  14. 15 1月, 2019 2 次提交
  15. 09 1月, 2019 1 次提交
  16. 05 1月, 2019 2 次提交
    • L
      make 'user_access_begin()' do 'access_ok()' · 594cc251
      Linus Torvalds 提交于
      Originally, the rule used to be that you'd have to do access_ok()
      separately, and then user_access_begin() before actually doing the
      direct (optimized) user access.
      
      But experience has shown that people then decide not to do access_ok()
      at all, and instead rely on it being implied by other operations or
      similar.  Which makes it very hard to verify that the access has
      actually been range-checked.
      
      If you use the unsafe direct user accesses, hardware features (either
      SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
      Access Never - on ARM) do force you to use user_access_begin().  But
      nothing really forces the range check.
      
      By putting the range check into user_access_begin(), we actually force
      people to do the right thing (tm), and the range check vill be visible
      near the actual accesses.  We have way too long a history of people
      trying to avoid them.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      594cc251
    • L
      i915: fix missing user_access_end() in page fault exception case · 0b2c8f8b
      Linus Torvalds 提交于
      When commit fddcd00a ("drm/i915: Force the slow path after a
      user-write error") unified the error handling for various user access
      problems, it didn't do the user_access_end() that is needed for the
      unsafe_put_user() case.
      
      It's not a huge deal: a missed user_access_end() will only mean that
      SMAP protection isn't active afterwards, and for the error case we'll be
      returning to user mode soon enough anyway.  But it's wrong, and adding
      the proper user_access_end() is trivial enough (and doing it for the
      other error cases where it isn't needed doesn't hurt).
      
      I noticed it while doing the same prep-work for changing
      user_access_begin() that precipitated the access_ok() changes in commit
      96d4f267 ("Remove 'type' argument from access_ok() function").
      
      Fixes: fddcd00a ("drm/i915: Force the slow path after a user-write error")
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: stable@kernel.org # v4.20
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b2c8f8b
  17. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  18. 13 12月, 2018 1 次提交
  19. 12 12月, 2018 1 次提交
  20. 07 12月, 2018 1 次提交
  21. 05 12月, 2018 1 次提交
  22. 21 11月, 2018 1 次提交
  23. 20 11月, 2018 1 次提交
  24. 12 11月, 2018 2 次提交
  25. 06 11月, 2018 1 次提交
  26. 26 10月, 2018 1 次提交
  27. 18 10月, 2018 1 次提交
    • C
      drm: add syncobj timeline support v9 · 48197bc5
      Chunming Zhou 提交于
      This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side:
      This extension introduces a new type of syncobj that has an integer payload
      identifying a point in a timeline. Such timeline syncobjs support the
      following operations:
         * CPU query - A host operation that allows querying the payload of the
           timeline syncobj.
         * CPU wait - A host operation that allows a blocking wait for a
           timeline syncobj to reach a specified value.
         * Device wait - A device operation that allows waiting for a
           timeline syncobj to reach a specified value.
         * Device signal - A device operation that allows advancing the
           timeline syncobj to a specified value.
      
      v1:
      Since it's a timeline, that means the front time point(PT) always is signaled before the late PT.
      a. signal PT design:
      Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled,
      the timeline will increase to value of PT[N].
      b. wait PT design:
      Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare
      wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be
      signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline,
      so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to
      perform that.
      
      v2:
      1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
      2. move unexposed denitions to .c file. (Daniel Vetter)
      3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian)
      4. split up the change to drm_syncobj_replace_fence() in a separate patch.
      5. drop the submission_fence implementation and instead use wait_event() for that. (Christian)
      6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)
      
      v3:
      1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
          a. normal syncobj signal op will create a signal PT to tail of signal pt list.
          b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT.
      2. many bug fix and clean up
      3. stub fence moving is moved to other patch.
      
      v4:
      1. fix RB tree loop with while(node=rb_first(...)). (Christian)
      2. fix syncobj lifecycle. (Christian)
      3. only enable_signaling when there is wait_pt. (Christian)
      4. fix timeline path issues.
      5. write a timeline test in libdrm
      
      v5: (Christian)
      1. semaphore is called syncobj in kernel side.
      2. don't need 'timeline' characters in some function name.
      3. keep syncobj cb.
      
      v6: (Christian)
      1. merge syncobj_timeline to syncobj structure.
      2. simplify some check sentences.
      3. some misc change.
      4. fix CTS failed issue.
      
      v7: (Christian)
      1. error handling when creating signal pt.
      2. remove timeline naming in func.
      3. export flags in find_fence.
      4. allow reset timeline.
      
      v8:
      1. use wait_event_interruptible without timeout
      2. rename _TYPE_INDIVIDUAL to _TYPE_BINARY
      
      v9:
      1. rename signal_pt->base to signal_pt->fence_array to avoid misleading
      2. improve kerneldoc
      
      individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
      timeline syncobj is tested by ./amdgpu_test -s 9
      Signed-off-by: NChunming Zhou <david1.zhou@amd.com>
      Signed-off-by: NChristian König <christian.koenig@amd.com>
      Cc: Christian Konig <christian.koenig@amd.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Daniel Rakos <Daniel.Rakos@amd.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
      Cc: Jason Ekstrand <jason@jlekstrand.net>
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/257258/
      48197bc5
  28. 12 9月, 2018 2 次提交
  29. 06 9月, 2018 1 次提交
  30. 04 9月, 2018 1 次提交
  31. 03 9月, 2018 1 次提交