1. 27 7月, 2018 1 次提交
    • J
      drm/i915/guc: Move the pin bias value from GuC to GGTT · dd18cedf
      Jakub Bartmiński 提交于
      Removing the pin bias from GuC allows us to not check for GuC every time
      we pin a context, which fixes the assertion error on unresolved GuC
      platform default in mock contexts selftest.
      
      It also seems that we were using uninitialized WOPCM variables when
      setting the GuC pin bias. The pin bias has to be set after the WOPCM,
      but before the call to i915_gem_contexts_init where the first contexts
      are pinned.
      
      v2:
      This also makes it so that there's no need to set GuC variables from
      within the WOPCM init function or to move the WOPCM init, while keeping
      the correct initialization order. Also for mock tests the pin bias is
      left at 0 and we make sure that the pin bias with GuC will not be
      smaller than without GuC.
      
      v3:
      Avoid unused i915 in intel_guc_ggtt_offset if debug is disabled.
      
      v4:
      Squash with WOPCM init reordering.
      Moved the i915_ggtt_pin_bias helper to this patch, and made some
      functions use it instead of directly dereferencing i915->ggtt.
      
      v5:
      Since we now don't use wopcm.guc.base for the pin bias there's no need to
      validate it. It also has already been verified in WOPCM init.
      
      v6:
      Deleted the now unnecessarily introduced includes from previous versions.
      Dropped naming changes from dev_priv to i915 for better patch readability.
      
      v7:
      Changed some comments to make more sense in the context they're in.
      
      v8:
      Moved and renamed the function which now returns the wopcm.guc.size to
      intel_guc.c:intel_guc_reserved_gtt_size to avoid any possible confusion
      with the pin_bias in ggtt, which should be used for pinning.
      Fixed patch not applying or the most recent upstream.
      
      Fixes: f7dc0157 ("drm/i915/uc: Fetch GuC/HuC firmwares from guc/huc specific init")
      Testcase: igt/drv_selftest/mock_contexts #GuC
      Signed-off-by: NJakub Bartmiński <jakub.bartminski@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Michał Winiarski <michal.winiarski@intel.com>
      Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180727141148.30874-3-jakub.bartminski@intel.com
      dd18cedf
  2. 20 7月, 2018 3 次提交
  3. 13 7月, 2018 4 次提交
  4. 11 7月, 2018 1 次提交
  5. 09 7月, 2018 1 次提交
    • C
      drm/i915: Provide a timeout to i915_gem_wait_for_idle() · ec625fb9
      Chris Wilson 提交于
      Usually we have no idea about the upper bound we need to wait to catch
      up with userspace when idling the device, but in a few situations we
      know the system was idle beforehand and can provide a short timeout in
      order to very quickly catch a failure, long before hangcheck kicks in.
      
      In the following patches, we will use the timeout to curtain two overly
      long waits, where we know we can expect the GPU to complete within a
      reasonable time or declare it broken.
      
      In particular, with a broken GPU we expect it to fail during the initial
      GPU setup where do a couple of context switches to record the defaults.
      This is a task that takes a few milliseconds even on the slowest of
      devices, but we may have to wait 60s for hangcheck to give in and
      declare the machine inoperable. In this a case where any gpu hang is
      unacceptable, both from a timeliness and practical standpoint.
      
      The other improvement is that in selftests, we do not need to arm an
      independent timer to inject a wedge, as we can just limit the timeout on
      the wait directly.
      
      v2: Include the timeout parameter in the trace.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NMika Kuoppala <mika.kuoppala@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk
      ec625fb9
  6. 07 7月, 2018 1 次提交
  7. 06 7月, 2018 2 次提交
  8. 05 7月, 2018 2 次提交
  9. 15 6月, 2018 4 次提交
  10. 14 6月, 2018 1 次提交
  11. 13 6月, 2018 1 次提交
  12. 12 6月, 2018 3 次提交
  13. 11 6月, 2018 1 次提交
    • C
      drm/i915/ringbuffer: Fix context restore upon reset · b3ee09a4
      Chris Wilson 提交于
      The discovery with trying to enable full-ppgtt was that we were
      completely failing to the load both the mm and context following the
      reset. Although we were performing mmio to set the PP_DIR (per-process
      GTT) and CCID (context), these were taking no effect (the assumption was
      that this would trigger reload of the context and restore the page
      tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
      following context switch would anything occur.
      
      Since we are then required to reset the context image and PP_DIR using
      CS commands, we place those commands into every batch. The hardware
      should recognise the no-ops and eliminate the expensive context loads,
      but we still have to pay the cost of using cross-powerwell register
      writes. In practice, this has no effect on actual context switch times,
      and only adds a few hundred nanoseconds to no-op switches. We can improve
      the latter by eliminating the w/a around known no-op switches, but there
      is an ulterior motive to keeping them.
      
      Always emitting the context switch at the beginning of the request (and
      relying on HW to skip unneeded switches) does have one key advantage.
      Should we implement request reordering on Haswell, we will not know in
      advance what the previous executing context was on the GPU and so we
      would not be able to elide the MI_SET_CONTEXT commands ourselves and
      always have to emit them. Having our hand forced now actually prepares
      us for later.
      
      Now since that context and mm follow the request, we no longer (and not
      for a long time since requests took over!) require a trace point to tell
      when we write the switch into the ring, since it is always. (This is
      even more important when you remember that simply writing into the ring
      bears no relation to the current mm.)
      
      v2: Sandybridge has to agree to use LRI as well.
      
      Testcase: igt/drv_selftests/live_hangcheck
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
      Cc: Matthew Auld <matthew.william.auld@gmail.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
      b3ee09a4
  14. 09 6月, 2018 4 次提交
  15. 08 6月, 2018 3 次提交
  16. 07 6月, 2018 1 次提交
  17. 06 6月, 2018 1 次提交
  18. 05 6月, 2018 1 次提交
  19. 04 6月, 2018 1 次提交
  20. 01 6月, 2018 2 次提交
  21. 24 5月, 2018 1 次提交
    • C
      drm/i915: Limit searching for PIN_HIGH · eb479f86
      Chris Wilson 提交于
      To no surprise (since we've flip-flopped over the use of PIN_HIGH a few
      times), doing a search by address over a pathologically fragmented
      address space is exceeding slow. To protect ourselves from nearly
      unbounded latency (think searching a million holes while under
      struct_mutex), limit the search for the highest available hole and
      fallback to best-fit if it fails.
      
      In the pathologically fragmented case, such as igt/gem_ctx_thrash, the
      effect is dramatic, bringing the runtime down from hours to seconds
      (depending on how many other slow searches you hit, e.g. alloc_iova()
      and alloc_vmap_area() both degrade to a slow rbtree walk after their
      small cache is exhausted). For the real world, the number of search
      steps is unlikely to be significant as we should only need to search
      once per new context.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-3-chris@chris-wilson.co.uk
      eb479f86
  22. 22 5月, 2018 1 次提交