1. 23 3月, 2019 8 次提交
  2. 22 3月, 2019 9 次提交
    • C
      drm/i915: Allow contexts to share a single timeline across all engines · ea593dbb
      Chris Wilson 提交于
      Previously, our view has been always to run the engines independently
      within a context. (Multiple engines happened before we had contexts and
      timelines, so they always operated independently and that behaviour
      persisted into contexts.) However, at the user level the context often
      represents a single timeline (e.g. GL contexts) and userspace must
      ensure that the individual engines are serialised to present that
      ordering to the client (or forgot about this detail entirely and hope no
      one notices - a fair ploy if the client can only directly control one
      engine themselves ;)
      
      In the next patch, we will want to construct a set of engines that
      operate as one, that have a single timeline interwoven between them, to
      present a single virtual engine to the user. (They submit to the virtual
      engine, then we decide which engine to execute on based.)
      
      To that end, we want to be able to create contexts which have a single
      timeline (fence context) shared between all engines, rather than multiple
      timelines.
      
      v2: Move the specialised timeline ordering to its own function.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-4-chris@chris-wilson.co.uk
      ea593dbb
    • C
      drm/i915: Extend CONTEXT_CREATE to set parameters upon construction · b9171541
      Chris Wilson 提交于
      It can be useful to have a single ioctl to create a context with all
      the initial parameters instead of a series of create + setparam + setparam
      ioctls. This extension to create context allows any of the parameters
      to be passed in as a linked list to be applied to the newly constructed
      context.
      
      v2: Make a local copy of user setparam (Tvrtko)
      v3: Use flags to detect availability of extension interface
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-3-chris@chris-wilson.co.uk
      b9171541
    • C
      drm/i915: Create/destroy VM (ppGTT) for use with contexts · e0695db7
      Chris Wilson 提交于
      In preparation to making the ppGTT binding for a context explicit (to
      facilitate reusing the same ppGTT between different contexts), allow the
      user to create and destroy named ppGTT.
      
      v2: Replace global barrier for swapping over the ppgtt and tlbs with a
      local context barrier (Tvrtko)
      v3: serialise with struct_mutex; it's lazy but required dammit
      v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
      similarly, turned out different!)
      
      v5: Fix up test unwind for aliasing-ppgtt (snb)
      v6: Tighten language for uapi struct drm_i915_gem_vm_control.
      v7: Patch the context image for runtime ppgtt switching!
      
      Testcase: igt/gem_vm_create
      Testcase: igt/gem_ctx_param/vm
      Testcase: igt/gem_ctx_clone/vm
      Testcase: igt/gem_ctx_shared
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-2-chris@chris-wilson.co.uk
      e0695db7
    • C
      drm/i915: Introduce the i915_user_extension_method · 9d1305ef
      Chris Wilson 提交于
      An idea for extending uABI inspired by Vulkan's extension chains.
      Instead of expanding the data struct for each ioctl every time we need
      to add a new feature, define an extension chain instead. As we add
      optional interfaces to control the ioctl, we define a new extension
      struct that can be linked into the ioctl data only when required by the
      user. The key advantage being able to ignore large control structs for
      optional interfaces/extensions, while being able to process them in a
      consistent manner.
      
      In comparison to other extensible ioctls, the key difference is the
      use of a linked chain of extension structs vs an array of tagged
      pointers. For example,
      
      struct drm_amdgpu_cs_chunk {
              __u32           chunk_id;
              __u32           length_dw;
              __u64           chunk_data;
      };
      
      struct drm_amdgpu_cs_in {
              __u32           ctx_id;
              __u32           bo_list_handle;
              __u32           num_chunks;
              __u32           _pad;
              __u64           chunks;
      };
      
      allows userspace to pass in array of pointers to extension structs, but
      must therefore keep constructing that array along side the command stream.
      In dynamic situations like that, a linked list is preferred and does not
      similar from extra cache line misses as the extension structs themselves
      must still be loaded separate to the chunks array.
      
      v2: Apply the tail call optimisation directly to nip the worry of stack
      overflow in the bud.
      v3: Defend against recursion.
      v4: Fixup local types to match new uabi
      
      Opens:
      - do we include the result as an out-field in each chain?
      struct i915_user_extension {
      	__u64 next_extension;
      	__u64 name;
      	__s32 result;
      	__u32 mbz; /* reserved for future use */
      };
      * Undecided, so provision some room for future expansion.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190322092325.5883-1-chris@chris-wilson.co.uk
      9d1305ef
    • S
      drm/i915/guc: GuC suspend path cleanup · b9d52d38
      Sujaritha Sundaresan 提交于
      Adding a call to intel_uc_suspend in i915_gem_suspend, which
      is a common point for the suspend/resume and hibernate paths.
      This fixes an unbalanced call that causes issues with the CTB
      register/deregister.
      
      v2: Making the call unconditional (Daniele)
      	Moving the call to after the GEM_BUG_ON (Chris)
      
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Signed-off-by: NSujaritha Sundaresan <sujaritha.sundaresan@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321203804.6845-1-sujaritha.sundaresan@intel.com
      b9d52d38
    • C
      drm/i915/selftests: Mark up preemption tests for hang detection · e70d3d80
      Chris Wilson 提交于
      Use the igt_live_test framework for detecting whether an unwanted hang
      occurred during test execution, and report failure if it does.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321194031.20240-2-chris@chris-wilson.co.uk
      e70d3d80
    • C
      drm/i915/selftests: Calculate maximum ring size for preemption chain · d067994c
      Chris Wilson 提交于
      32 is too many for the likes of kbl, and in order to insert that many
      requests into the ring requires us to declare the first few hung --
      understandably a slow and unexpected process. Instead, measure the size
      of a singe requests and use that to estimate the upper bound on the
      chain length we can use for our test, remembering to flush the previous
      chain between tests for safety.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: N"Yokoyama, Caz" <caz.yokoyama@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321194031.20240-1-chris@chris-wilson.co.uk
      d067994c
    • C
      drm/i915: Skip object locking around a no-op set-domain ioctl · 754a2544
      Chris Wilson 提交于
      If we are already in the desired write domain of a set-domain ioctl,
      then there is nothing for us to do and we can quickly return back to
      userspace, avoiding any lock contention. By recognising that the
      write_domain is always a subset of the read_domains, and excluding the
      no-op case of requiring 0 read_domains in the ioctl, we can infer if the
      current write_domain matches the target read_domains, there is nothing
      for us to do.
      
      Secondary aspect of this is that we undo the arbitrary fetching and
      potential flushing of all pages for a set-domain(.write=CPU) call on a
      fresh object -- which was introduced simply because we do the get-pages
      before taking the struct_mutex.
      
      References: 40e62d5d ("drm/i915: Acquire the backing storage outside of struct_mutex in set-domain")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-2-chris@chris-wilson.co.uk
      754a2544
    • C
      drm/i915: Flush pages on acquisition · a679f58d
      Chris Wilson 提交于
      When we return pages to the system, we ensure that they are marked as
      being in the CPU domain since any external access is uncontrolled and we
      must assume the worst. This means that we need to always flush the pages
      on acquisition if we need to use them on the GPU, and from the beginning
      have used set-domain. Set-domain is overkill for the purpose as it is a
      general synchronisation barrier, but our intent is to only flush the
      pages being swapped in. If we move that flush into the pages acquisition
      phase, we know then that when we have obj->mm.pages, they are coherent
      with the GPU and need only maintain that status without resorting to
      heavy handed use of set-domain.
      
      The principle knock-on effect for userspace is through mmap-gtt
      pagefaulting. Our uAPI has always implied that the GTT mmap was async
      (especially as when any pagefault occurs is unpredicatable to userspace)
      and so userspace had to apply explicit domain control itself
      (set-domain). However, swapping is transparent to the kernel, and so on
      first fault we need to acquire the pages and make them coherent for
      access through the GTT. Our use of set-domain here leaks into the uABI
      that the first pagefault was synchronous. This is unintentional and
      baring a few igt should be unoticed, nevertheless we bump the uABI
      version for mmap-gtt to reflect the change in behaviour.
      
      Another implication of the change is that gem_create() is presumed to
      create an object that is coherent with the CPU and is in the CPU write
      domain, so a set-domain(CPU) following a gem_create() would be a minor
      operation that merely checked whether we could allocate all pages for
      the object. On applying this change, a set-domain(CPU) causes a clflush
      as we acquire the pages. This will have a small impact on mesa as we move
      the clflush here on !llc from execbuf time to create, but that should
      have minimal performance impact as the same clflush exists but is now
      done early and because of the clflush issue, userspace recycles bo and
      so should resist allocating fresh objects.
      
      Internally, the presumption that objects are created in the CPU
      write-domain and remain so through writes to obj->mm.mapping is more
      prevalent than I expected; but easy enough to catch and apply a manual
      flush.
      
      For the future, we should push the page flush from the central
      set_pages() into the callers so that we can more finely control when it
      is applied, but for now doing it one location is easier to validate, at
      the cost of sometimes flushing when there is no need.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
      Cc: Antonio Argenziano <antonio.argenziano@intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NMatthew Auld <matthew.william.auld@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk
      a679f58d
  3. 21 3月, 2019 16 次提交
  4. 20 3月, 2019 7 次提交